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ABSTRACT 

Consider a machine with a cellular memory used to store a list X , where X is 
a finite alphabet and i C IN! . We investigate the machine representation of such a 
list mid the implementation of common list operations such as determining the 1 
element and adding or deleting an clement. Information-theoretic arguments are 
used m order to obtain lower bounds on storage and access costs for implementing; 
vat lable- length lists and, m particular, stacks. Representations arc discussed which 
attain these bounds separately and can sometimes attain both, although it is shown 
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On the constructive side, we show that it is possible to implement a stack of any 
finite length so as to achieve Kraft storage and so that the number of memory cell 
accesses required to perform a PUSH or a TOP operation is always 0(log n) but 
where, assuming a nonincreasmg probability distribution on stack lengths, a POP 
operation requires on the average only a constant number of accesses. 
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CHAPTER 1 

INTRODUCTION 

With the present-day widespread use of computers, it is important to be able 
to efficiently store information and execute operations. For a given problem, 
depending on the structural relationships between the data elements, we choose to 
use a particular type of data structure. In this thesis, we shall consider only the 
simplest information structure, a list; in particular, we discuss stacks and briefly 
mention some work with queues. 

1.1 The Data Model 

The data model I will use for studying list structures is based on the model of 
a storage and retrieval problem developed by Flias [53 and Welch [233. A retrieval 
problem consists of a collection of data bases, any one of which may be observed at 
a given time, and a set of retrieval questions which may be asked of any data base. 
It may also be desired to perform updates; i.e., to transform the currently observed 
data base into some other data base from the domain, the set of possible data bases. 

A retrieval system which solves a retrieval problem must have several 
components: 

(1) a method of representing any observed data base, 

(2) a method for answering any retrieval question about the observed data 
base, 

(3) a method for performing updates on the observed data base. 

For a given question, the method for answering the question must be independent 
of the observed data base; to allow the method to depend on the observed data 
base would presuppose some knowledge of the observed data base by the user in 
order to determine which method is appropriate. Thus, the method must give the 
correct answer no matter what the current data base is. 
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The following example illustrates what we mean by a storage and retrieval 
problem. We delay discussion of how a data base might be stored and how a query 
or update might be implemented until the next section, where we shall reconsider 
this example. 

Example 1.1. Consider the problem of Rotary Fan Manufacturing Co., R.F.M., 
receiving mail orders for fans. Somehow R.F.M. must keep track of these orders to 
be filled. Exactly what information is needed depends on the questions and updates 
that will be executed. A data base corresponds to the current list of orders to be 
filled. The domain is the set of all possible data bases; i.e., all possible lists of fan 
orders. Notice that there are data bases of different sizes; in fact, it may be 
possible for a data base to have any integral sue greater than or equal to zero. Of 
course, if R.F.M. wants to stay in business for long it had better be the case that 
shorter data bases are more probable than larger ones. 

Because old orders are continually being filled and new orders received, it 
must be possible to update the current order list; in particular, R.F.M. needs to be 
able to perform the following two updates. 

u : Process an order from the order list. This involves mailing the desired 
fans and deleting the order from the current list. Thus, the size of the 
data base is decremented by one. 
u 2 : A new uidei arriving must be placed on the order list, which results in 
the size of the current data base being incremented by one. 
R.F.M. must also be prepared to answer queries concerning the current data 
base, such as whether or not John Doe's order is on the list, or whose order will be 
filled next. For instance, we might have the following set of questions, 
q x : Is (name) a customer waiting to have his order processed? 
q 2 = Who is the k th customer in line; i.e., what order will be the k f 1 to be 

served? 
q 3 : What are the l most recently placed orders? 
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Exactly what information needs to be stored on the order list depends on the 
particular queries that will be made. I 

1.2 Lisl Structures 

We consider a list problem to be a type of storage and retrieval problem, 
where each data base is a particular list. In general the size of the list may vary, 
and exactly how the list will be implemented depends on the specific questions and 
updates to be performed. In this section we introduce the basic list structures we 
will be concerned with in this thesis: stacks, queues, and dequeues. The 
appropriate operations will be formally defined later. 

A linear list is just an ordered sequence of items chosen from a particular set 
of elements (see e.g. Knuth [14], Alio, Hopcroft, and Ullman [1]). In many 
instances, accessing of a list is restricted to the first and last elements; in particular, 
it may be the case that items can be added or deleted only at the ends of the list. 
Because these lists are frequently encountered, they have special names: stacks, 
queues, dequeues. 

A stack, also known as a push-down store or a L1FO (last-in/ fust-out) list, 
is a linear list for which all insertions and deletions are made at one end of the list, 
the top. For example, consider an initially empty stack; i.e., there are no elements 
in the list. Suppose we then insert two elements onto the stack: 

Element 1, Element 2. 
Since Element 1 was the first item put onto the stack, it occupies the bottom stack 
position and is the least accessible item; it cannot be removed until all other 
elements on the stack have been removed. To add, PUSH, a third element onto the 
stack, we locate the top of the stack and insert this new element, Element 3: 

Element 1, Element 2, Element 3. 
Element 3 is now at the top of the stack, and so if we delete, POP, an element 
from the stack, we are left with: 

Element 1, Element 2. 
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Of course, if the stack had been empty we would not have been able to perform a 
POP operation, so there must be some way of detecting an empty stack. 

Exactly how one might choose to implement a stack is one of the issues 
discussed in this thesis. Figure 1.1 should help picture how the stack operations 
work and corresponds to one common type of implementation, where each item in 
the stack has a pointer which indicates the location of the previous stack item. An 
additional pointer always points to the top of the stack. Such a storage arrangement 
allows the stack operations to be performed in a straightforward way. In particular, 
a TOP operation is performed by reading the pointer in order to locate the top of 
the stack and then simply reading what the TOP value is. To perform a POP we 
locate the top of the stack, use this element to locate the second stack element, and 
then reset the top of stack pointer to this second element, which becomes the TOP 
element. Similarly, a PUSH operation can be implemented by first locating some 
free memory cell, into which the appropriate new stack value is inserted. This new 
cell has a pointer which is set to the same location as the top of stack pointer, and 
then the top of stack pointer is changed so that it points to the newly filled cell, our 
new top of stack. The pointers involved in these implementations are indicated in 
Figure 1.1. Notice that the directions of the pointers between the stack elements 
make reading "down" the stack straightforward, but there would be no way to read 
back "up" the stack. Of course, if the stack occupied a contiguous section of 
memory, there would be no need at all for pointers between the stack elements. 
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Figure 1.1. Stack Operations 
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A queue, also known as a FIFO (first-in/ first -out) list, or a circular list, is a 
lineai- list tor which all insertions are made at one end of the list, the rear, and all 
deletions are made at the other, the front. Thus, elements leave the list in the same 
order in winch they entered. Suppose we insert, ENQUEUE, three elements onto an 
initially empty queue, first element J, then element 2, then element 3: 

Element 3, Element 2, Element 1. 
If we now delete, DEQUEUE, one element, we are left with: 

Element 3, Element 2. 
Figure 1.2 illustrates the queue operations. Notice that if the arrows between 
elements in Figure 1.2 were reversed, then after performing a DEQUEUE operation 
we would have no way to keep track of the location of the front of the queue. Of 
course, we might choose to store pointers going in both directions, but this would 
involve greater storage costs. 
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Figure 1.2. Queue Operations 



A dequeue is a linear list for whicti all insertions and deletions are made at the 
ends of the list. Thus, a stack and a queue can each be viewed as a particular type 
of dequeue. One may also distinguish output-restricted or input-restricted dequeues, 
in which deletions or insertions, respectively, are allowed to take place at only one 
end. The ends are commonly referred to as left and right, although either an 
insertion or a deletion may occur at either end (see Figure 1.3). We shall not in 
this report discuss any results specifically concerning dequeues, but it appears that a 
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dequeue can be viewed as a straightforward extension of a queue. 
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Figure 1.3. Dequeue Operations 

Now that we have discussed these simple list structures, let us reconsider the 
issue of developing a solution to the system of Example 1.1. 



Example 1.2. How R.F.M. develops a system to solve its order problem depends not 
only un finding an efficient means to store any data base, but also on what queries 
and updates it expects to be making most often. Thus, finding an "optimal" solution 
would depend on knowing some rather precise probabilities. On the other hand, 
we can at least make some general comments. The representation of a data base 
must include the names of the persons who ordered fans, as well as the other 
necessaty information such as quantity ordered, address, payment, etc. It would 
probably make sense to store a data base as some sort of list structure. For 
simplicity', let us consider only a list of names and assume that each name also 
contains a pointer to the relevant corresponding information. In other words, we 
access any element in the list by reading the appropriate name. We have decided 
that each data base is to be represented by a list structure, but the type would be 
determined by R.F.M.'s desired processing order. Let us discuss several possible 
implementations. 

One reasonable scheme would be to process orders FIFO; i.e., in the same 
order in which they arrived. This would correspond to implementing some sort of 
queue, perhaps as in Figure 1.2. In this case we always keep track of the next 



-12 - 

order to be processed and the last order received. Presumably, updates Uj and u z 
would be easy to perform. On the other hand, returning the answer to question qj 
requires searching the queue for a particular name. Unless we have more 
information, this could require searching through the entire list. For a queue 
implementation, it would probably be straightforward to answer question q 2 , by 
tracing backward k items from the front. On the other hand, q 3 would probably 
be difficult to answer. To determine the one most recently placed order would 
require only a single access to the rear of the queue. But to determine the second 
most recently placed order is not as easy. Unless there is some way of knowing the 
"reverse pointers", then it would be necessary to read all items from the front, 
keeping track of each previous item read, until we reach the rear of the queue. Ot 
course, if we expected q 3 to be asked frequently, we might wish to alter our 
implementation scheme and store both forward and reverse pointers. At the price 
of increased storage, we could decrease the expense of answering q 3 . 

Another possible scheme would be to try to process orders as they are 
received, using a stack representation. Of course, R.F.M. Co. might lose a lot of 
business this way, because if it gets at all behind in processing orders, then some 
poor souls would be stuck indefinitely at the bottom of the stack. (And R.F.M. 
hasn't even considered the issue of cancelling an order from the middle of the list!) 
With such a FILO implementation, we would expect q 3 to be easier to answer than 
it was with a queue implementation, but now q 2 doesn't even make sense, because 
there is no way to know when an order will be processed. Question q j would 
probably be no more or less difficult than it was for the queue. 

If we expected to spend most of our time answering question q p we might 
want to sort the list of names alphabetically. (This would also make it easier to 
cancel an order.) But then we would need some additional means of indicating the 
processing order, such as a number field associated with the name. Unless we want 
to mail out the fans according to some alphabetical order, we would cither need 
pointers to indicate the processing order or else updates might be very expensive. I 
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1.3 Computer-Implemented List Problems 

In this thesis we are concerned with computer-implemented solutions of list 
problems. Recall that in Section 1.1 we mentioned the three components that any 
such system must possess. Note that requiring the algorithmic method for answering 
a question (or performing an update) be independent of the observed data base 
implies a strict separation of "program" and "data". The "program" to answer a 
question must remain constant, while presumably the computer memory state 
(representing the observed "data") differs for different observed data bases. 

A computing system which finds the values of a function f:D f -> R { can be 
viewed information -theoretically as a deterministic communications channel with 
input d i D f and output value \'{d) * R r In [6], Elias considered the strictly 
informational limits on computer performance and obtained lower bounds on 
storage and access required in the computation of a single function. This was done 
by allowing freedom of choice of representation of the input and decoding of the 
output. Viewing the contents of a computer's memory as a codeword, Elias [7] 
dealt with questions about the use of codewords which are not sequences but are 
sets of bits at addresses scattered throughout a shared memory. The next step was 
to extend these results to the computation of a family of functions defined on a 
common domain. An overview of much of this work is given by Elias C9], and an 
analysis of the complexity of some simple retrieval problems with update was given 
by Elias and Flower [10]. Warner [22] has investigated the performance of 
retrieval systems for tables of entries. 

Let us note that information-theoretic approaches have been taken to other 
problems as well. The work of Kolmogorov [IS] using minimal program length as a 
measure of computational complexity has an informational flavor. Also Chaiten [4] 
viewed the contents of memory as a program to be executed. Other work has been 
done relating to problems of exact and partial match and their storage and access 
costs (Minsky and Papert [19], Rivest [20], [21]). 
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This thesis extends work that Elias has done, in which he has considered 
many issues concerned with storage and retrieval problems using a fixed size linear 
array. To allow the natural representation and manipulation of data, variable size 
arrays such as stacks, queues, dequeues, lists, and trees are frequently used. The 
fact that they have variable size makes different storage representations and 
accessing techniques appropriate; for instance, we must consider the basic 
operations of insertion of new elements and deletion of existing elements. 

We are interested in investigating certain costs associated with solving 
computer -implemented list problems. In particular, we are concerned with lower 
bounds on the cost of storing a data base and on the cost of implementing a 
question or an update on the currently observed data base. The storage cost we 
measure in terms of the number of memory cells required for the data base 
representation. The implementation cost we measure in terms of the number of 
memory accesses required, which is in general directly related to the time taken to 
perform an operation. 

We begin by in Chapter 2 discussing the formalism of our machine model and 
what it means to solve a list problem. Chapter 3 discusses storage and access costs 
and explains the notions of Kraft storage and access, indicating the types of cost 
bounds we might expect to obtain. In Chapter 4 we consider the entire set of table 
lookup questions and investigate consequences of achieving Kraft storage and access. 
Possible implementations for the table lookup question set are explored in Chapter 
5, where we discuss three types of representations: fixed length, endmarker, and 
pointer. These same representation classes are analyzed in Chapter 6 with respect to 
implementing' stacks. Finally, we summarize our results, discuss how the techniques 
we have developed can also be used to help obtain storage and access bounds for 
queues and dequeues, and point out directions for future work. 
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CHAPTER 2 

SOLUTION OF A LIST PROBLEM 

In this chapter we discuss our formal machine model and what it means to 
solve a list problem. Tins work is based on the model of a storage and retrieval 
problem developed by Elias [5], [6], [81 We shall here introduce much of the 
terminology and notation that is used throughout the thesis. We first define a 
storage ai id retrieval problem, and then define our machine model and what it 
means for a machine to answer a question correctly. We discuss the distinction 
between the problem and machine domains and then define the machine 
representation of a problem domain. At this point we are finally in a position to 
state precisely what it means for a machine to solve a storage and retrieval problem. 
In the last section we summarize some of the ideas presented in the chapter, in 
order to clarify what we mean by the solution of a list problem. 

?..l Definition of a Storage and Retrieval Problem 

Let F be a family of functions (operations) defined on a common domain ID, 
and indexed by some index set J c IM ; p = {f | j ( J}. An operation f, * F is an 
ordered pair of functions f ( = (qpU;), where dom(fj) - ID and ran(Uj) c JD. \Vc 
refer to an element d C ID as a data base. Executing operation f' ( on data base 
J t ID returns the value q 4 ( c/) and has the side effect of updating d to the new 
value u { (d); we denote this by i^d) =■- (q^c/) ,iij(</) ). Q = {qjl (q p Uj) * F} is 
called rite question set and U = {u ( | (qpU ( ) £ F} is called the update set of F. We 
refer to (F, ID) as a storage and retrieval problem. If the data base (/ is not 
changed as a consequence of executing f ( (i.e., if u,(t/) = d) , then (F, iD) is said 
to be a static problem, and we may write it as (Q, ID). In general, however, the 
data base may change with time, in which case (F, ID) is a dynamic problem, or a 
problem with update. 
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In this thesis, we shall consider storage and retrieval problems which represent 
list data structures; we refer to these as list problems, or simply problems. In Section 
2.7 we will be in a better position to explain precisely what we have in mind when 
we discuss the solution of a list problem. Let us begin by presenting a simple 
example of a storage and retrieval problem, which will illustrate some of the above 
terminology. Examples 2.2, 2.3, and 2/7 are extensions of this example. 

Example 2.1. Let ID = {d { \ < i < 6} where each d i « ID is a string of symbols 
from the set A' = {0,1}; i.e., each c/, * A*: 

d = A c/ 4 = 01 

c/j =0 ^5 = 1° 

d z = 1 d G = 11 

c/ 3 = 00 
Note that we write d = X to indicate that d is the null string, the string with no 
elements. Now consider two operations on ID, f : and f 2 . 1 he function 

f , = (qpUj) is simply the identity question and update: 

q,U,) = (/, 

u ,(«/,) =</, 

Since u 1 causes no change to the data base c/ p f, effectively has only a question 

component and so is a static operation. We define f 2 , however, to be a dynamic 

operation: f g = (q 2 ,u ? ), where 

q 2 (c/ ) =a u 2 (c/ ) =t/ o 

q^t/j) = aj u 2 (e/!) =e/ 

q 2 (J 2 ) = a,ai u 2 (rf 2 ) =c/, 

q 2 (c/ 3 ) = a!a a u 2 (c/ 3 ) = e/j 

q 2 (rf„) =a,a 2 a, u 2 U 4 ) = c/ g 

q 2 (c/ 5 ) =a 2 a 23l u 2 U 5 ) = d z 

q 2 ((/ 6 ) = a^a^ u ? U 6 ) = e/ 3 

Thus, executing the operation f 2 on data base c/ 3 gives the answer aja a and 
changes the current data base, c/ 3 , to the data base d y Notice that 

domdj) = dom(f 2 ) = ID, ran(iij) = ID, and ran(u 2 ) = {rf , t/,, d z , c/ 3 } c ID. 
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So if we were to execute the sequence of operations f 2 '^i'^2'^2 on ^3> t * ien we 
would expect the sequence of answers to be aja a , c/j, aj, a and the resulting 
data base to be d Q . I 

We frequently denote the domain of a function, dom(f), by D f and similarly 

ran(f) by R r Where we have a set F = {fjl i * J} of operations, we may find it 

convenient to write D, and R, for D f and R { , respectively. If there is no 

i 'MM 

possibility of confusion, we may simply omit the subscripts and write D and R. 
For instance, D(S) denotes the domain of the set S. Note that when we discuss a 
problem (F, ID), we write ID to refer to the problem domain, which happens to be 
the common domain of each function f * F. 

2.2 Definition of the Machine Model 

Our machine model is a deterministic, sequential, random access 
cell-addressable, halting automaton Tfl, with a memory m consisting of L cells 
(where L may be infinite). The set of all possible contents of a memory cell, 8, 
corresponds to TTl's finite input alphabet, and 8 denotes the set of possible memory 
states. Via its memory, Tfl stores a sequence b i 8 , which it reads in some order- 
determined by the structure of 1TI and the values in b. Tfl may or may not rewrite 
values as it reads the cells, but it eventually prints a sequence of output symbols 
chosen from some finite output alphabet £. Since Tfl is deterministic, a given input 
(initial state of memory) always causes Tfl to print the same output (if 171 halts) , so 
TTI computes a partial function « from inputs in 8 to outputs in £*. If we let 
$(T7l) c s L be the set of inputs for which 111 halts in finite time and 91(111) c £* 
be the set of outputs which TTI prints before it halts, then each automaton TTI defines 
a "characteristic function" «:£(Ttt) -* 9t(Tri). The only functions which Ti"l can 
actually compute are restrictions of its characteristic function to some subset of its 
acceptance set. 
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2.3 Machine Compulation of a Static Function 

Now that we have in mind a machine definition, let us investigate in what 
sense a machine 1TI with L memory cells can compute a static function q:D q -» R q . 
Technically, a machine Tfl can compute the values of a question q:D q -» R q only 
when D Q $(1Tl) and R c JR(TTl). It is often claimed, however, that a machine 
1TI computes a function q even when the machine alphabets and the problem 
alphabets are not identical. In such a case, the user also has in mind two 
non -machine components: a coder and a decoder. The coder consists of some 
encoding relation T'-D -» $ , from the domain of q onto a subset £ q c £(171) ; 
each d < D_ is taken into a subset r{d) Q 8 L , and any string b * r{d) is said to 
"represent" d. (Wc shall later use the symbol p to stand for an encoding function, 
as explained in sections 2.5 and 2.6. Using that terminology, our encoding relation 
r will be seen to correspond to a relation p.) The decoding function 5=9? q -* R q 
maps the subset 9L = «($ ) Q $(Ttt) onto the range of q. The machine is said to 
compute q correctly if, for any d $ D q , when any b * t(c/) is supplied to HI and 
gives output e = <o(6) = « ° t(c/), the decoding 6(e) of e satisfies 

qU) = 6(e) =5 o &> . r(d), d * D. 
In particular, o> ° r must be a function. These conditions are summarized in the 
following diagram, where all arrows denote total and onto functions or relations: 

T 

Z) q •* $ q c $(TTl) £ # L 

q I i restriction of w lu 

/? q <- s» q s 9i(m) s £* 

To help us understand all of this terminology, we consider the computation of 
question q 2 from the previous example. 
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Example 2.2. Recall the question q 2 from Example 2.1, where 
lD n = {d,\ < i < G}, R. c {a ,a.,a 2 }* Let TTl be a deterministic, sequential 
halting automaton with a memory m consisting of three cells. Let 8 = {0,1,2,0}, 
£ = {0,1,2}. 1TI operates as follows: it reads the string' of inputs until it encounters 
a 0, reading in order memory cell 0, then cell 1, then cell 2; it interprets the string 
of characters from {0,1,2} as the ternary representation of a natural number; TTl 
computes, also in ternary, the square of this number, prints it, and then halts. So 

2 

$(m) = U {o > i,2} 1 o{o,i > 2,o} 2 - 1 

1=1 

= {000, 001, 002, 000, 100, 101, 102, 100, 200, 
201, 202, 200, 100, 110, 120, 200, 210, 220} 

M(rri) = {o, l, u, loo, 121, 221, noo, 1211, 2101}. 

1TI computes q 2 correctly, if we choose our encoding and decoding relations 
appropriately. Let T'-D -» S 3 be defined as follows: 

t(c/ ) - {000, 001, 002, 000} t(c/ 4 ) = {110} 

r(t/,) = {100, 101, 102, 10} t(c/ 5 ) = {120} 

r(d 2 ) = {200, 201, 202, 200} r(d e ) = {200} 

r(d 3 ) ={100} 

Thus, 3> n =£(m) -{210,220} and JR. = ffi(ITl) -{1211,2101}. Now define 

H2 Hz 

b-m -R by 

HZ ^2 

6(0) = a 6(121) = aja 2 aj 

6(1) = a x 6(221) = a 2 a 2 a 1 

5(11) = ajaj 6(1100) = ajaja a 

6(100) = ai a a 
So the machine 1TI with encoding T and decoding 6 computes q 2 correctly. For 
instance, 

5 o (o o T (c/ ) = 6 o w (000) = 6(0) = a = q z (d ) 
6 . w . t(c/ 6 ) = 6 o «(200) = 6(1100) = a^a^ = q 2 U 6 ) I 
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2.4 The Problem Domain 

Because in this thesis we are concerned with representing list structures, we 
consider a data base d t ID to be a string of characters chosen from the problem 
alphabet X. For notational convenience, we formally represent d as a set of \d\ 
ordered pairs, containing one value c/(n) from the alphabet X for each n * IM less 
than lt/|: 

d = {(n,rf(n))IO <n < lc/|, d(n) « X). 
When there is no chance of ambiguity, we may write d = x iX 2 XjX 3 to stand for 

d = {(0, Xl ), (l,x 2 ), (2.x,), (3,x 3 )}, 
where each x, ( X. Thus, what the formal ordered pair notation does is to 
explicitly state the implied order of characters in the string d. In an obvious way, 
the definition of d could be extended to include countably infinite strings; i.e., we 
may wish to consider the size of a data base d * ID to be unbounded, in this 
thesis, we shall consider only problem domains ID where for all d j, d 2 $ X , 
t/, t ID if and only if d z d D. Thus, if we allow a string d x * X k to be in the 
domain ID, then all strings in X k are included in D. Certainly there might be 
instances where we would want to restrict character sequences, but unless we 
consider specific applications it would be difficult to characterize the domain. 
Therefore, we consider only problem domains ID of the form D = U X , for some 
J c IN. 

Example 2.3. In Example 2.1, the problem domain consists of seven data bases, 

ID = {rfj < i < 6}. The problem alphabet is X = {0,1}, and each t/, * A'*. In 

particular, 

ID = U X 1 = {X} U X U X 2 . 
ie{o,i,2} 

The data base d 4 , for example, is the string 01 £ X z , which can be formally 
written as {(0,0), (1,1)}. Similarly, we can denote each t/, * D: 
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d = {} = X */ 4 = ((0,0), (1,1)) = 01 

d x = {(0,0)} = c/ 5 = {(0,1), (1,0)} =10 

</ 2 = {(0,l)} =1 c/ 6 = {(0,1), (1,1)} =11 

e/ 3 = {(0,0), (1,0)} =00 

Notice that the data base d is just the empty string, X. When we view d as being 
represented by a set of ordered pairs, then d = { } = 0. Thus, we might either 
say that d = X or that d Q = 0, depending on our viewpoint at the moment. I 

2.5 Machine Representation of the Problem Domain 

As we have observed, a data base itself cannot be stored in memory. Instead, 
we store some encoding of the data base, a string of values from the alphabet 8. 
Each d t ID is mapped by r into some subset of 8 L . It is unnecessarily restrictive, 
however, to require that an encoding r specify values for every memory cell. In 
fact, most computer systems allocate only certain sections of memory to a given 
user, and other users may write in the remaining cells of memory in ways unknown 
to the first user. In order to model practical memory allocation schemes such as 
linked lists (recall Section 1.2), it is necessary to allow an encoding to specify values 
for only some of the memory cells. 

Thus, we view r{d) as some set of codewords, a subset of the code C = t( iD) 
(see Elias L8]). Each codeword c * C is itself a finite set 

c = {(j,c(j)l j«D(c)} 
of Id ordered pairs. The first coordinate of each pair (j, c(j)) is the integer 
address j * IM of a cell in memory, and the second coordinate is the value c(j) * 8 
assigned by c to be stored at that address. Thus, each codeword in C is a partial 
function c=IM -» 8 from integer addresses to values in 8\ its domain, D(c), is a 
finite subset of IN!. 

We denote by $ + the class of all such partial functions from IM to 8 that are 
each defined on a finite domain. Thus, a codeword set C is just a subset C c 8 . 
The domain D{C) of a set C c S + is the union of the domains of its members: 
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D{C) = Ud(c). 

Example 2.4. Let 8 = {0,1}, and consider the code C l = (c , c p c 2 }, where 

c = {(0,0), (2,1)} 
c x ={(0,1), (1,0)} 
c 2 = {(l,l), (2,0)} 

Each codeword c, is a partial function c^lM -» {0,1,2,0}, so Cj * 8 and C x a 8 . 

Notice that D(c ) ={0,2}, D( Cj ) ={0,1}, Z>(c 2 ) ={1,2}, and D(C,) ={0,1,2}. 

We may find it convenient to represent C x as an array, as in Figure 2.1, where the 

i ,h row represents codeword c. The entries in each row correspond to the contents 

of the corresponding memory cells. The j th entry in row c ( is the value c,(j) if 

j * D(c ( ) and is blank, if j <£ D(cj). Each column corresponds to a memory cell 

address, here 0, 1, or 2. I 



C < 








1 


1 









1 






D(C) 
Figure 2.1. Representation of Code C x as an array. 



Recall that we write 8 L to denote the set of all L -celled memories. Then a 
memory state m is in 8 if m £ 8 and its domain is D{m) = {0,1,2, . . . ,L-1}, so 
that 

m = {(0,m(0)), (l,m(l)), • ■ • , (L-l,m(L-l))}, 
where the first member of each pair (n, m{n)) is the integer address n * IN of a 
cell in memory, and the second is the contents »i(n) * 8 of cell n. (Recall that it is 
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possible that L be infinite.) A codeword c * C is stored in a memory m * S by 
setting m( }) = c(j), for all j( D{c). Other users may fill in the values of the 

L - |c| cells not occupied by c but must leave c itself undisturbed. 

j. _ 

For any string' b $ 8 we can define its L-dosure, b^, as the set 

b L = {m * 8 L I b c m) 

of memories in 8 that store b, in the sense that the (address, value) pairs in b are 

included among those in m. For L < maxD(ft), 6^ = 0. Where the value L is 

understood, we frequently write b to mean b^. Note that \b^\ = ISl . 

Define the set 

8* = Ug L 

of all finite memories that store values from 8. Then for b * 8*, 

D{b) = {0,1, . . . ,L} for some L * INI. So the L-dosure of b contains all sequences 

in 8 L with prefix 6: 5=6- 8^ L ~^ . 

Example 2.5. Recall code C^ from Example 2.4, where 8 = {0,1}. Since 

Kc,) L l = l/3l L " ic i', then |(c,) 3 l = 2 3_|c i' = 2. So for L = 3 there are two memory 

states which contain the codeword c r In particular, 

(c ) 3 = {m * 8 3 \ c c m ] 

= {{(0,0), (1,0), (2,1)}, {(0,0), (1,1), (2,1)}}. 

We can represent the 3-closures of c , Cj, c 2 in array form, as in Figure 2.2. 

Notice that no matter how other users may fill in memory cells n where n <£ Z)(cj), 

it is always possible to tell precisely what codeword Cj is being stored. Since L = 3 

and 8 - {0,1}, there are eight possible memory states, six of which store codewords 

from C r 

Also note that 

(c ) 2 = 

(c ) 4 = {{(0,0), (1,0), (2,1), (3,0)}, {(0,0), (1,0), (2,1), (3,1)}, 
{(0,0), (1,1), (2,1), (3,0)}, {(0,0), (1,1), (2,1), (3,1)}}. 

Since c, * 8 Z , c, * #*, but c , c 2 $ 8*. I 
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D(Cj) 



Figure 2.2. Representation of the closures of codewords in C v 



Having discussed what we mean by an encoding t:D -* 8 and a code 
C c 8 1 , we can now explain what we shall mean by a representation p-D -> 8 . 
Throughout the thesis, unless otherwise specified, we always make the assumption 
that p is a one-to-one function. Thus p{d) is a single codeword in 8 , and 

(Ve/ p e/j€D)(i * j * />(</,) *p{d)). 
The one-to-one condition guarantees that distinct data bases d i and c/j map to 
distinct codewords. Since 

p L (d) ={mt 8 L I p{d) c m ), 
we can see that the relation p corresponds to the relation r in Section 2.3. When 
17l's memory contains precisely L cells, a specification of a representation p 
indicates, for any d * D, that the cells in D(p[d)) be filled in as specified and the 
remaining cells can be filled in any possible way by other users. 

For instance, suppose we have some representation p, for which 
p(d ) = {(0,1), (2,0)}; i.e., d Q * ID is represented by any memory state in which 
m{0) =1 and m{2) =0. Since the value w(2) to be stored in cell 2 is not 
specified, cell 2 corresponds to a "don't care". For L = 3, we shall find it 
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convenient to write p{d Q ) = 1_0 to mean p{d Q ) ={(0,1), (2,0)}. Where L is 
understood, we may even write p(d Q ) = 1_0 rather than p{d Q ) - 1_0__ for L = 5; 
i.e., we may suppress all trailing "don't cares", which serve simply as place holders. 

We saw in Example 2.5 that if c ( £ C x is stored in memory, then it is always 
possible to distinguish c ( , no matter what other users have done with cells not in 
D(c ( ). In other words, there is no memory state in 8 that stores both Cj and c,, 
for i * j. When this is the case, we say that c, and c, are distinguishable. 

Definition. Let p:\D -» 8 , and let d v d z ( ID. Then p{d x ) and p{d 2 ) are 
said to be distinguishable if and only if 

p L {d x ) n p L (d z ) = 

for any L > m:ix{mzx D{ p{d x ) ) , maxD(/?(c/ 2 ))}. 

In other words, a code C c 8 is distinguishable if and only if the closures of its 
members are pairwise disjoint (see Elias C8]). 

If there exist t/,, d 2 <- ID such that p{d x ) and p[d 2 ) are not distinguishable, 
then for some memory state m Q it is not possible to tell whether d i or d 2 is stored; 
in fact, m represents both c/j and d 2 . We do not want to allow this loss of 
information and so make the following formal definition of a representation. 

Definition. We say that a function p-\D -» 8 is a representation if and only 
if for all t/j, d 2 (- ID, where d^ * d z , p[d x ) and p{d 2 ) are distinguishable. 

Example 2.6. Let ID = {d Q , d v d 2 ], 8 = {0,1}, and L = 3. Consider the function 
p--\D -* # + defined by 

p(d ) =0_1 
/>(c/,) =10_ 
p{d 2 ) =_00 

Then p is not a representation, because it does not have disjoint 3-closures. In 

particular, p(d^) and p{d 2 ) are not distinguishable: 

p{d x ) n p{d 2 ) = {ioo, 101} n {ooo, 100} = {100} 1 
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Example 2.7. Let us define the function p-\D -» 8 3 by 

p{d ) = 00_ p(dj = 110 

/»U,) = 10_ p{d 5 ) = 120 

M</ 2 ) = 20_ /3(c/ 6 ) =200 
/?U 3 ) =100 

Notice that 



? 3 ^o) = ((0,0), (1,0)} 3 = (000. 001, 002, 000}. 
Thus, their are four memory states that correspond to a representation of c/ , and 
the relation p is identical to the relation T of Example 2.2. I 

From now on, we define an encoder by specifying a representation function p. 
Then any string b ( P^id) represents the data base d. 

2.6 Solution of Dynamic Problems 

In Section 2.3 we explained what it means for a machine 111 to answer correctly 
a question q. Now that we have also discussed what we mean by a representation, 
we can explicitly state what we mean when we say that a machine Tfl solves some 
list problem. 

We can extend the notion of the computation of a function (question) q to 
include the solution of a set of questions Q = {q,| i * J}, where eacli q^iD -> R { 
maps a common domain ID onto its own range /? r Since the ranges are in general 
different for different questions, a set A = {5,1 i ( J} of different decodings is 
allowed. For the solution of the family of questions Q, we introduce a set 
1TI = {ITIjI i ^ J} of machines with a family ft = {wj i ( J) of different characteristic 
functions, where w^j -> DJ,. We can consider tTI. to be a single device, with a set 
Sj -- {s,l i < 2} of distinct initial states, or programs. TH, is the submachine 
corresponding to tTI started in the initial state s ; . We say that (Til, p, A) solves 
(Q, ID) if, for all i * J, Til computes q ; correctly. In other words, if (Tri p p, 6 ; ) 
computes q : , then for any m $ p^(d), q^d) = h { ° co.(w). 
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Having seen what it means for a machine to solve a static problem (Q, ID), 
let ns now extend this to include updates. Recall that in our discussion of the 
machine model, it was mentioned that 171 may rewrite some of its memory cells. 
Thus, when given some input m Q , 171 may halt in a new memory state m y For a 
machine 1TI which computes a single function f, if we want to be able to compute f 
several times in succession, then it is natural to require that this new memory state 
be in 171's acceptance set. In fact, if TTlj solves (q p u,) correctly, then performing 
an update function on any memory state containing p{d) leaves us with a memory 
state that is a representation of the problem domain update function u,(i/). In 
general, we want a machine 111 to compute a family of functions F, and so we 
represent our update function in the machine domain by the family of functions 
T = {v.\ 1 J), where u.:£(m) -> $071) for 35(171) = U 35(171.) c S L . 

Definition. Consider the machine 171 = {1711 i < J} with the family 
= {o) ( l i i J] of characteristic functions and the family T - {v t \ l * J} of 
update functions. We say that (171, /?, A) solves the dynamic problem (F, ID) 
if the following conditions are satisfied for all f, = (q,,Uj) ( F = 

(1) q { (d) = Sj o Wl o p L {d) 

(2) u,(/5 L (c/)) c^( u ,(c/)). 

2.7 Solution of a List Problem 

In this section we merely want to summarize what we shall mean when we talk 
about the solution of a list problem. 

First, recall from Section 2.1 that a list problem is a storage and retrieval 
problem (F, D) where the domain elements have some list structure, e.g., they may 
be stacks. In any case the problem domain ID consists of strings of characters 
chosen from the problem alphabet X and is of the following form: ID = U X for 
some J c IN. For any d * ID, we want to be able to perform the operations in F; 
e.g., TOP (return the value at the top of the stack) and POP. 
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If a machine 1 Tl Is to solve the list problem (F, ID), then there must be some 
way to represent each d * ID in the cells of TTl's memory with machine alphabet & 
In particular, there is some one-to-one representation function p-\D -* 8 , and any 
p{d) stored in m can be viewed as some sort of codeword. The representation has 
the property that it is always possible to determine what (if any) codeword is 
currently stored in memory. What other users do cannot interfere with this 
determination. 

Suppose the current memory state is m , where m Q « p L (d). '1 hen V\ i will 
output the answer o>, ° P L (d) and halt in the new memory state u,(/^U)). If we 
claim that 1TI., computes the function f, = (q P Uj) <• F, then 
u,(? L (c/)) £ ^ L (u,(c/)) and there must be some sort of decoding function 5, such 
that q ,((/)= 6, «> «, o J> L {d). In other words, TTl, outputs the machine 
representation of q,U) and halts in a memory state which is included in the set of 
memory states that represent u^d). 

We say that (TTl, p t A) solves the list problem (F, ID) if the above conditions 
are satisfied for all f, < F and for all d * ID. For simplicity, we shall also assume 
that each decoding function 5, < A is one-to-one. Thus, we speak of a system 
(TTl, p) solving a problem (F, ID). 

When we discuss the machine solution of a problem (F, ID), we have m mind 
a representation of the domain ID in memory and some collection (k of algorithms 
or programs which compute the functions F. Any algorithm tt, that we discuss can 
be implemented by a machine TTl, as defined above. Since we do not, however, 
always want to concern ourselves with all the details of the machine itself, we shall 
henceforth speak of a system (tt, p) solving a problem (F, D). Thus we specify 
an implementation by defining the function p and by, in some (usually 
program-like) form, presenting the set of algorithms tt (which can be implemented 
by machine 1TI). 
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CHAPTER 3 

STORAGE AND ACCESS COSTS 

In Section 3.1 we introduce various system costs involved in solving a problem. 
Since in this thesis we are concerned with obtaining lower bounds on storage and 
access costs, these costs are discussed more fully in sections 3.2 and 3.3, respectively. 
We first define our cost measures and then present some basic results. For further 
information the interested reader is referred to Elias [6], [9], CIO]. 

3.1 System Costs 

Many different systems can be used to solve the same problem, and the choice 
among them depends on their relative costs. There are three basic components of 
system cost: 

(1) Storage cost. There is always some sort of purchase or rental cost for 
the memory used to store the representation of a data base. 

(2) Access cost. This refers to the number of memory cell accesses made 
by an algorithm or machine and is a partial indication of the time 
required by a system to answer a question or perform an update. 

(3) Processor cost. This involves the costs in memory and logic of the 
algorithm or machine 171 itself. 

For several reasons, we do not in this thesis consider the processor cost. First, 
any such measure would reflect characteristics of the particular machine, and it is 
therefore difficult to determine an appropriate measure. We have deliberately tried 
to let out machine model be as general as possible. Second, the list implementations 
we do consider are in general quite straightforward and therefore a system which 
does well for both storage and access costs probably would not have a prohibitive 
processor cost. Third, the storage-access trade-off is easier to recognize and we do 
not want the current analysis to become too complex. 



-30 - 
3.2 Storage Costs 

One measure of the memory requirements of a retrieval system {(k, p) solving 
a problem (F, ID) is the number of memory cells dedicated to the storage of a 
representation in memory. 

Definition. Consider a system (ft, p) solving a problem (F, ID), and 
assume that p is a function. The memory storage cost, \p[d)\, associated with 
any data base d * D is the number of memory cells for which representation 
p specifies a value when representing d- 

\p(d)\ £\D(p{d))\. 

Thus, we define \p(d) I to be the number of memory cells occupied by the codeword 
p{d). There is, however, no requirement that the set of occupied cells be 
contiguous; i.e., there may be "gaps" or "holes" in the representation. Because we 
are essentially concerned with obtaining lower bounds, we charge only for the cells 
actually occupied by p(d) and do not charge for these gaps. 

Example 3.1. Let 8 = {0,1} and define the code C z = {c , c,, c 2 , c 3 , c 4 } as 
follows: 



c = 0_1 


c 3 = 000 


Cj =10. 


c 4 = 111 


c 2 =_10 





Suppose that ID = {d , d v d z . d 3 , d A ) and the representation p:D -» # + is defined 
by /?(</;) = c r Then 

l/»(£/ )l = l/»(e/,)l = l/»(cr 2 )|-2 
and l/>(</ 3 )l = l/>(c/ 4 )|=3. I 

Certainly the issue of memory management is an important one, because it 
may be difficult to efficiently allocate to a single user the unspecified memory cells 
corresponding to holes in another user's memory space. Elias [93 has addressed the 
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problem of assigning a contiguous section of memory, defining the span of a 
representation p to be the smallest set of contiguous memory cells capable of holding 
the representation of any domain element. Many representation schemes we shall 
construct will be able to avoid such gaps, at least when the problem alphabet is of 
the appropriate size. 

Our storage cost measure does not indicate the complexity of the encoding p. 
For a static problem, storing a representation would be only a one-time task. When 
we consider dynamic problems, the complexity of the representation will evidence 
itself in the costs of performing updates. In general, a complicated encoding results 
in higher access costs. 

Consider a code Cc8 f that has the property that for each c * C, 
D(c) = {0,1, . . . ,lcl-l}; i.e., C c #*. Then C is said to be a prefix code, or to be 
prefix -free, if none of its members is a prefix of any other. In other words, a 
prefix -free set C c 8* has the property that 

(Vcj, c 2 « C) (cj t c 2 ). 
As noted by Elias [8], a code C c 8* is distinguishable if and only if it is a prefix 
code. 

The well known Kraft inequality C2], [12], [16] states that a necessary and 
sufficient condition for the existence of a prefix code with codeword lengths 
flp P- 2 , . . . , P- k and codeword characters chosen from the alphabet 8 is that: 

k -A 

2 \8\ '' <i. 

i=l 
This result is probably most easily seen by recalling the simple correspondence 

between prefix codes and labeled trees. Each node corresponds to a memory cell 

number, and the branch labels correspond to the cell contents; i.e., there are \8\ 

branches from each node. Each codeword is associated with a distinct leaf. Wc 

adopt the convention that the leftmost branch of each node always corresponds to 

the same element b * 8, and similarly for each of the other branches. For full 

trees this convention eliminates the need for writing the labels on branches 

emanating from non-root nodes. In particular, for 8 = {0,1}, we always let a 
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leftward branch correspond to a zero and a rightward branch to a one. 

Example 3.2. Recall the representation p:\D -» 8* from Example 2.7. The code 

p(\D) = {00, 10, 20, 100, 110, 120, 200} 
is a prefix code and satifies the Kraft inequality because 

I \S\~ M = 3 • 4' 2 + 4 • 4" 3 - 1 < 1 
c«/>(ID) 4 

The tree corresponding to the code p(D) is illustrated in Figure 3.1. 




/W 



P(d 3 ) p{dj p{d,) 



p(^) 



Figure 3.1. Tree corresponding to p from Example 3.2. 



Elias has extended the Kraft inequality to any distinguishable code C c 3 + . 



Theorem 3.1. (Elias C83). Let C c 8 f be distinguishable. Then 

2 isf lcl < i. 

Equivalently, consider any representation p:& -* 3 + . Then 

2 i S r Wrf), <i. 

c/€!D 



(3.1) 



(3.2) 



Proof: Let 

C L = {c ( CI L >max D(c)} 
be the subset of the code C whose elements can be stored in an L-cell memory. 
Since C is distinguishable, the closures of its members are disjoint and we have 

U c, QS L . 
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L-lcl 
Recalling also that Ic^l = 161 , we obtain 



v L-lcl L 

2 \8\ < 161 
ciC, 



Now dividing through by |6'I gives 



Since C L c C L+1> 



and so 



^ -c 

2 161 <i. 



•^ -Id ^ , -Id 

2 161 < 2 161 <1, 



lim 



( 2 \8'i) = 1. 



This proves (3.1). Since any representation p is by definition distinguishable, the 
Kraft inequality also holds for representation storage costs and thus (3.2) follows. I 

Theorem 3.1 is a statement about distributions of the storage measure \p{d)\ for any 
representation p of domain ID. Not all data bases in ID can have short 
representations, since a small value of \p{J)\ corresponds to a large term in the 
Kraft sum. If some of the data bases have relatively short representations then 
others must have relatively long representations. If, in fact, we have equality in 
the Kraft sum, then no data base representation can be shortened without 
lengthening another data base representation. 

Definition. We say that a representation p achieves Kraft storage if and only 
if the Kraft sum of equation (3.2) is satisfied with equality: 

I Mf"""-! (3.3) 

eKID 

Similarly, a code C achieves Kraft storage if the Kraft sum of equation (3.1) is 

equal to one. 
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We can also extend our usage of trees to correspond to any distinguishable 
code C c # + . However, since we do not restrict ourselves to prefix codes (i.e., we 
allow scattered representations), we would not necessarily choose to have the 
memory cells read in order 0, 1, 2, . . . on the path to every leaf. This and the 
result of Theorem 3.1 are illustrated in the following example. 

Example 3.3. a) For code Cj of Example 2.4, 131 = 2 and 

2 \8\~ M = 2' 2 ♦ 2' 2 + 2" 2 = 1 < 1. 

A tree corresponding to Cj is given in Figure 3.2a, with the memory cells listed in 
order 0, 1, 2. On the other hand, we might choose to represent C, by the tree in 
Figure 3.2b. In any case, C x does not achieve Kraft storage. 



(a) 



(b) 





Figure 3.2. Trees corresponding to code C y 



b) For code C z of Example 3.1, 



-Ic 



1 \S\ = 3 • 2" 2 + 2 • 2 -3 = 1 

and so C 2 achieves Kraft storage. A tree for code C 2 is given in Figure 3.3 
c) Recall once again the representation p:\D -* 8* from examples 2.7 and 3.2. 
since each J t ID has a unique representation p{d)- 



'hen 



2 \8\ 

d<:\D 



-\p{d)\ 



2 \8\ 
cipiD) 



= 3 • 4" 2 + 4 • 4" 3 = 4- < I- 



4 
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Figure 3.3. Tree corresponding to code C z . 

When we solve some problem we would like to find a representation that docs 
not result in high storage costs. We say that a representation p~-\D -> 8 is optimal 
in storage if no other representation requires less storage for some data base without 
requiring more storage for another. 

Definition. A representation function p:\D -* 8 achieves optimal storage if 
and only if for any /?':ID -* 8 

(Ve/, i ID)C(l/»'(c/ 1 )l < !/>(</, I) *(3J 2 € D)(l/»'(c/ 2 )l > l/>(c/ 2 )|)l 

Thus, we use the term optimal storage for a representation if no other 
representation can uniformly do better. There may, of course, be many 
representations that are storage optimal, and which would be preferred depends on 
the particular problem and is conditional on the probabilities of the various data 
bases in ID. In fact, one might not choose to use a storage optimal representation at 
all if such a representation resulted in higher access or other system costs. However, 
these involve details of particular problems and, for the general framework, we arc 
considering, we shall not usually prefer one optimal representation over another. 

If a representation p meets the Kraft sum with equality, then p is storage 
optimal. This condition makes it easy to recognize certain storage optimal 
representations. 
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Theorem 3.2. Consider the representation function p:D -* 8 . If 

2 isi =i, 

then p is storage optimal. In other words, if p achieves Kraft storage then p is 
storage optimal. 

J. 

Proof: If p is not storage optimal, then there exists some representation p'-\D ~* 8 
such that (Vc/ * D)(I/»'U)I < \p[d)\) and (3^ « ID) ( l/7'( c/ 1 ) | < !/»(«/,) I). But 
this says that 

i . 2 to-" 1 "' 
dm 

dm 

which contradicts the Kraft inequality of Theorem 3.1. I 



Example 3.4. Recall Example 2.7 where 8 = {0,1,2,0}, and consider the alternative 

encoding p 2 -\D -» 8 defined by 

/> 2 U > =0 p 2 (d 4 ) =01 

p 2 (d,) =1 p 2 (d 5 ) =02 

p z {d z ) =2 p 2 (d 6 ) =00 

P 2 (d 3 ) = 00 

and also the encoding p 3 '-\D -» S defined by 

/> 3 U ) =00 p 3 (d 4 ) =1 

/> 3 (c/,) = p 3 {d 5 ) =2 

/> 3 (c/ 2 ) =02 p 3 (d 6 ) =01 

/> 3 U 3 )=00 

By Theorem 3.2, both p 2 and /? 3 are storage optimal because 
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2 la-"' 1 " 1 . 3-4" ♦«. «-«-i. 2 m^ d) \ 

di\D </<ID 

On the other hand, p as defined in Example 2.7 is not storage optimal because p z 
does better; in fact, p z takes less storage everywhere: 

(VJ« ID)(l/? 2 (c/)l < l/»(c/)l). 
The representation /> 3 also does better than p, because it never uses more storage 
and sometimes uses less. 

If we were forced to pay a very high price for storage, we would probably 
choose to solve the problem (F, ID) of Example 2.1 using representation p z or p 3 
rather than p. However, p corresponds to a simple ternary representation (with 
serving as an endmarker) and might be more desirable than p z or p z in terms ot 
other costs. • 

We have seen that a code /'(ID) achieves optimal storage if we get equality in 
the Kraft sum. Let us examine the conditions under which this equality is attained. 
We first define a distinguishable code C c 3 + to be complete if and only if for all 
c' * # + , C U {c'} is not distinguishable. Elias [8] has shown that a finite 
distinguishable code C c 3 + is complete if and only if the L-closnre of its members 

v -Icl 

partitions 8 (for L = maxD(c)) which is true if Z 131 = 1. I he converse is 

C<zC 

v . -Id 
not true, i.e., a code C may be complete even if Z ISI * 1- 

ciC 

Example 3.5. Recalling Example 3.3, we see that C x is not complete, since C x c C z . 
However, C z is complete. If we look at the trees for Cj and C z , given in figures 3.2 
and 3.3, it is easy to see that Cj does not partition {0,l} 3 , since there are some 
leaves in the tree for C x that correspond to no codeword. Also, by Example 3.3 we 
know that Cj does not achieve Kraft storage and thus cannot be complete (since it 
is finite) ; C ? does achieve Kraft storage and is therefore complete. I 
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\VV can conclude that, as illustrated in the above example, a finite iSl-ary code C is 
complete if and only if every leaf in a full ISl-ary tree for C corresponds to some 
codeword c ( C. 

Using the terminology of representations, we can show that if a representation 
p-\D -> # + achieves Kraft storage, then />(ID) is complete. 

Theorem 3.3. Let p--\D -* # + be some representation which achieves Kraft 
storage. Then for all b ( # + , there is some d * ID such that b c p L {d). 

Proof: Let p achieve Kraft storage and assume that there is some b * S such 
that, for all d * ID, b £ p L id). In other words, 6 and d are distinguishable, for 
every d. Then 

I , 3 f" (rf,l <i . Im Md) '<u 

t/<IDU{6} «/*D 

which contradicts the fact that p achieves Kraft storage. • 

The converse is not true (see, once again, Elias C81). However, if p{\D) is 

complete for ID finite, then we do know that p achieves Kraft storage. 

Let us briefly mention two results concerning worst case and average storage 

costs. The first result follows from well-known tree properties (see e.g. Callager 

[123) and states that for any representation /j:ID -* # + , there is some data base 

whose representation specifies values for at least Hog, IIDH memory cells. On the 

131 
other hand, for any domain ID there is some representation which never requires 

more than floe; |D|1 memory cells. 

"161 

Theorem 3.4. (Elias [6]). (i) For any representation function p-D -* S , 

max \p[d)\ >Nop; HDD 
cKlD ISI 

(li) There is some representation function p:\D -» S* such that 

max \p(d)\ =riog, IIDI1 
d<B ISI 

This result can be interpreted in terms of any tree corresponding to the ISl-ary 
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distinguishable code p(\D) , where there must be at least liDI leaves (since p is 

one-to-one). Since the tree is iSl-ary, the depth of the tree (i.e., the length of the 

longest codeword) must be at least ("log IIDIl Also, a complete, full l/jl-ary tree 

131 
with 1 10 f leaves has all of its leaves at either depth Flog |ID|"1 or depth 

Tloa IIDIl - 1. 
151 
The second result involves average storage costs. There will be occasions 

where we wish to consider some sort of probability distribution P on the members 

of our domain ID: 

P(c/) & the fraction of time a user expects to consider data base d ( ID. 

Thus, it makes sense to look at the average storage cost: 

1 ?{d)-\p{d)\. 
c/€!D 
We can use a procedure such as Huffman encoding [13], [12] to construct a 

representation p for which very probable data bases have short representations and 

less probable data bases have longer representations. Other pieconstructcd 

universal codes perform almost as well as Huffman codes, provided the shorter 

preconstructed representations are assigned to the more probable data bases (see 

Elias C7]). 

Theorem 3.5. (Elias [6]). Consider a domain ID and assume there is some 
probability distribution P on ID. Define the entropy H(ID) by 

H(ID) = - 1 PU)log, PU). 

dm isi 

(i) For any representation function /?:ID -* 8 , the average storage cost is 

I ?(d)-\p(d)\>l\(D). 
di\D 

(ii) There is some representation function p-'D -* 8 such that 

2 ?(d)-\p{d)\ < H(ID) +1. 
di\D 
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3.3 Access Costs 

A user is necessarily concerned with the amount of time it takes to perform an 
operation f on some d t ID. The number of memory cell accesses made by an 
algorithm before halting is one direct indication of the performance time. I his 
memory access measure has been used by Minsk y and Papert [19] and Ehas [51 
The number of accesses made to memory will depend not only on the algorithm 
used but also on the particular data base which is stored. 

There are various ways in which we could define an access, but we use the 
notion commonly used in Turing machine theory. A machine or algorithm reads a 
cell and, depending on that cell's contents, may rewrite the value stored there; this 
corresponds to only one access. We also choose to allow an algorithm to possibly 
read a cell in another user's memory space, but the algorithm certainly cannot 
rewrite such a cell (without being charged for it in storage). 

Definition. Consider a system (01, p) solving a problem (F, ID). A memory 
cell access is made each time (k moves to a new cell. Once (L references a cell, 
it may read and/ or rewrite the cell contents; this constitutes a single access. 

Depending on the hardware of an actual machine, this reading and then rewriting 
action might require two accesses, in which case our results could be off by a factor 
of two. Flower [11] has investigated update costs and shown that it is necessary for 
an access measure to involve both reads and writes; considering either reads or 
writes alone docs not give reasonable lower bounds. 

We present the following example in order to illustrate some of the 
terminology we shall use when we discuss the implementation of a function. We 
frequently find it convenient to describe an algorithm using a program-like 
description. 
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Example 3.6. Recall examples 2.1 and 2/7 and consider the problem of performing' 

the update operation u 2 on some data base d * ID. The following algorithm, <^ u , 

performs the update. (For simplicity, we do not here consider the question 

component of the function f 2 .) 

fJL u : if w(0) = then return 

if w(0) = 1 then if mil) = then mil) *- 

return 
if m{l) = 1 then mil) <- 
w(0) -2 
return 
if mil) = 2 then mil) <- 
?«(0) ♦- 2 
return 
if w(l) = then miO) <- 
return 
if w(0) = 2 then w(0) «- 1 
return 

For instance, suppose we have />(t/ ) in memory. Civen that we know there 
is some pid { ) stored, when we access cell and discover that wi(0) = 0, then we 
know that it is d Q stored. Since u 2 (t/ ) = d Q , we do not need to rewrite any 
memory cells. Thus, performing the u 2 operation on pid Q ), using algorithm Cl u , 
involves only a reading of cell 0. 

Suppose d 5 is stored in memory with representation p. Using algorithm (k u , 
we first access cell 0. Since miO) = 1, we next access cell 1. Since mil) = 2, we 
rewrite cell 1, setting it to the new value 0, and then backtrack and set w(0) «- 2. I 

Because we spend a great deal of time discussing algorithms for performing various 
operations, we find it convenient to make some notational definitions for dealing 
with memory access costs. 

Definition. Suppose a system (61, p) solves a problem (F, ID). Then for 
each d < ID we can define the following. 

[(JL(/j(c/))] = the sequence of memory cell accesses made by algorithm 



flj in computing f^id) using representation p. 
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#[(*,(/?( c/))] £ \L(k i (p(d))1\, the number of memory cell accesses made 
by algorithm fil, in computing f^d) using representation 

/»■ 

{[a,( /?(«/))]} ^ the set of memory cells accessed by algorithm (k { in 
computing fj(c/) using representation p; i.e., the access set 
for f,(c/) corresponding to algorithm (A r 

We may sometimes write [fj( /?(</))] to denote the access sequence which an 

algorithm (A, uses to compute f,( /?((/)). 

We refer back to Example 3.6 to illustrate the above definition. 

Example 3.7. Recall the algorithm (k of Example 3.6. In computing u 2 (i/ 5 ), 

2 

(k„ first reads cell 0, then reads and rewrites cell 1, and then backtracks and writes 

u 2 

cell 0. Thus, the access sequence is 0, 1, 0. For notational convenience, when we 
give an access sequence we shall underline any memory cell accesses which 
correspond to writes: 

ca 1 (/ J (rf ))]=o ca,(/>(c/ 4 ))] =010 

[ttjMc/,))] = oio ca^Ug))] = oip_ 

w,(/»(c/ 2 ))3 = o ca,(/»(rf 6 ))] = o 
ca^c/g))] = oi 

Then for the number of memory cell access in each case we dearly have: 

#ca u {p{d ))i = «tt u (/»(rf 6 ))] =uia {/>(d z )i = 1 
#ca u (^c/,))] = #:a u (/»(j 4 ))] = #ca u (/>U 5 )] = 3 

Ct Ct & 

#ca u (/j(j 3 ))] =2 

Note also that the access sets are just: 

{r_a U2 M</ ))]} = {ui^ P {d z m) = {Uk^( P (d & )Vi = {0} 
{ca u (/»(«/,))]} = {:a u (/K«/ 3 ))]} = {ca u (/»(rf 4 )]} = {ca u (/>U 5 ))]} = {o,i}i 

2 Ci Ct Ci 

Since our algorithms are sequential and deterministic, we find it convenient to 
model them by access trees. Access trees are basically the same as the trees we used 
in Section 3.2, where each internal node corresponds to a memory cell access. An 
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access tree corresponding to the algorithm for a question q will label each leaf by 
the appropriate answer q{d), if there is one. We speak of the access tree for q, (or 
u j) to mean the access tree for an algorithm (k. solving q i (or u ( ). 

Example 3.8. Consider the static problem (F, ID) where F = {f p f 2 } and 

ID = {c/ , t/j, c/ 2 }. Define the representation function p-D -* {0,1} by: 

p(d ) = 0_0 
p(J x ) =1_0 
p(d 2 ) =__1 

Let qj and q 2 be defined as follows-" 

<1i(</ ) = a 
q,(c/ x ) =b 

q,(c/ 2 ) =b 

where a, b t £. An access tree corresponding' to the obvious algorithm for q j is 
given in Figure 3.4a. Notice that, in fact, two accesses are necessary to distinguish 
p(d Q ) from p{d x ) or p(d 2 ) and thus two accesses are required to determine the 
leaf that can be labelled a. Question q 2 , however, can be answered after a single 
access, to cell 2. | 



q 2 U ) = a 
q 2 ((/,) = a 
q 2 (c/ 2 ) =b 



(a) q, 




(b) q. 




Figure 3.4. Access trees for q t and q 2 of Example 3.8 



Each output corresponds to some leaf on the access tree for q,, and we define 
otjO) to be the minimum depth of any leaf labelled r. 
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Definition. Suppose a system (a, p) solves a static problem (Q, ID), and 

let D,(r) ={c/t IDIqic/) =r}. 

Then 

a.(r) g min Ul(kXp(d))l 
t/<Di(r) ' 

Similar to our storage result, we have a Kraft inequality for access. 

Theorem 3.6. (Elias [61). If the iSl-ary system (0., p) solves a static 

problem (Q, ID), then for all q i * Q- 

2 \8\' aSr) <h (3.4) 

r*q,(D) 

Corresponding to each answer r i q/D), the range of q„ there is one term in the 
summation with negative exponent a,(r). This theorem is a statement about 
distributions on the numbers of accesses to return the answers r « R and tells us 
that not all operations in q,(D) can have short retrieval times. In fact, equation 
(3.4) can be strengthened; it holds not only for air), the minimum number of 
accesses ro return the value r, but also for the number of accesses to return the 
alue r for any d < q," 1 ^). In other words, if we let i/, « q^fr), then we have 



i=l 



Definition. Suppose a |g|-ary system (a, p) solves a static problem 
(Q, ID). Then tt, is said to achieve Kraft access if 

2 isf" ,(r) =l. (3.5) 

rtq.(ID) 
In fact, if (3.5) holds and (a„/») is understood, we shall frequently say 

simply that q, achieves Kraft access. 

If we "assume q, achieves Kraft access", we mean that we are considering some 
system ((A., p) where (k. achieves Kraft access and answers q; on domain ID. 
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In accessing a cell we read some b [ * 8. Information-theoretically, one access 
distinguishes among 161 possibilities, and if it is not the case that each of these l£l 
possible cell contents leads to a different answer, then we have in some sense 
obtained more information than is needed. Thus, if an algorithm (L achieves Kraft 
access, then its access tree must be a full tree where every leaf corresponds to a 
distinct r t R. In particular, we have the following result. 

Theorem 3.7. Suppose a system {&, p) solves a problem (Q, ID). If (k i 
achieves Kraft access, then for all c/j, d 2 t ID^r), 

//[a,( /»((/,))] = via i (p(d 2 ))l 

Let's look again at the problem from the previous example. 

Example 3.9. Recall Example 3.8, and let R x and R 2 denote q , ( ID ) and q 2 (iD), 
respectively. For q y 

v<R x >t{a,b} 

-a, (a) -«,(b) o , -i 

4 
Notice that the access tree for qj in Figure 3.4a does not have a distinct label for 

each leaf and so cannot achieve Kraft access. For q 2 : 

~ -cto(r) , < 

2 l/il 2 =2 _1 +2" 1 =1, 

and so does achieve Kraft access, which is what we would expect by observing 
Figure 3.4b. I 

As we did for storage costs, we define an implementation or algorithm to be 
optimal in access if no other implementation of the operation requires fewer accesses 
for some data base representation without requiring more accesses for some other 
data base representation. 
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Definition. An implementation (ft,, p) is access optimal if and only if for 
any other implementation (ft,', p) : 

( Vc/j < iD)[(#[a/(/j(rf,))] < «ca.( />((/,))]) 

=> (3</ 2 < D)(#CflL,'(/j(c/ 2 ))] >#[«.,( /»U 2 )) 3) 3 

Similar to our result for Kraft storage, if ft, achieves Kraft access then ft, is access 
optimal. 

Theorem 3.8. Suppose the iSl-ary system (ft, p) solves the static problem 
(Q, ID). If 

£-" ,{r) 
i<q,(ID) 

then ft ( is access optimal. 



SI ' =1 



Unless we allow the trivial question, which always returns the same value no 
matter what data base is stored in memory, then it is always necessary to make at 
least one access to answer a question. 

Theorem 3.9. Civen any implementation (ft, p), assume that ft,(/?(t/)) is 
not a constant function. Then, for all d £ ID, 

UUk^pid))! >1. 

Corollary 3.9.1. If #[ft,(/?(c/) )3 = 1 for all d * ID, then ft, is access optimal. 

If \R\ < \8\, then when we access one cell wc can distinguish I SI characters, 
whereas we only have \R\ distinct answers. Therefore we have in some sense 
obtained more information than we can use, giving us an inequality in the Kraft 
sum, as the next theorem shows. 

Theorem 3.10. Consider a iSI-ary system (ft, p) which answers the question 
q:ID -* FL If ft achieves Kraft access, then \R\ > ISI. 
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Proof: Assume (I meets Kraft with equality. Then by Theorem 3.9 it is always the 
case that oc(r) > 1, and so 

i= I\8\- a{v) < 2 isr 1 =B 

If \R\ < I /j I then we get a contradiction. I 

Notice that this theorem does not depend on the representation used. 

Assume we have an implementation that achieves Kraft access for some set Q 
of questions. This then tells us something about the possible relative range sizes of 
questions in Q. We first recall a lemma about trees (see e.g. Kriuth [14]). 

Lemma 3.1. There is a full iSl-ary tree with k leaves if and only if there is 
some n * IN such that k = (|#| - 1) • n + 1. (The number n corresponds to the 
number of internal nodes in the tree.) 

From this lemma and recalling that we have equality in the Kraft sum only when 
the exponents correspond to the depths of the leaves in a full tree, we have the 
following theorem. 

Theorem 3.11. (Callager [12]). Lctf:J-»IN. If 2 |#|" r(0 = 1, then 
IJI = n • (161 - 1) + 1 for somen « IN. 

This now tells us something about the possible pairwise relative sizes of the ranges 
of questions that each achieve Kraft access. 

Theorem 3.12. Consider a l#l-ary system ((i, p) which answers the 
questions q^D -> R^ and q 2 =ID -> R 2 , and assume both q, and q 2 achieve 
Kraft access. Then there is some integer n such that \R^\ - \R Z \ = n-(l6'l - 1). 
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Proof: Since both q t and q 2 achieve Kraft access, 

tt* -&i(v) rr -«,(r) 

I \8\ ' = 1 and 2 \8\ 2 = 1 

1*/?, l</2 2 

By Theorem 3.11 we thus know that there exist n t , n 2 * IM such that 

1/2,1 = 1 + n,(l#| -1) 
and |/2 2 l =1 + n 2 (lfil - 1) 

Therefore, 1/2,1 - |/? 2 I = (n, - n 2 ) ( l£l - 1). I 

We find this theorem useful for some of the results we shall prove later. 

As was the case when we discussed storage, it is difficult to understand what 
the Kraft inequality of Theorem 3.8 tells us about access costs of interest to the user, 
except when we actually do achieve Kraft access. Thus, we mention two results 
concerning access costs; these correspond to the storage theorems 3.4 and 3.5. 

First, if we need to distinguish \Rl> answers with a l#|-ary tree, it is clear that 

the access tree must have maximum depth at least Nog- \RX\. Also, it is always 

151 ' 
possible to answer a question q ( in such a way that the corresponding access tree 

has maximum depth exactly flog I/2JL 

Theorem 3.13. (Elias [6]). Consider a problem ( F , ID ) . 
(i) If the l/j'l-ary system ((*., p) answers the question q^ then 

max a,(r) > Hog 1/2,11 

(n) There is some l#l-ary system (61,, p) that answers question q i such that 

max «,(r) = Hog, ,1/2.11. 
r*fl, i/il 

I he bound in (li) can be attained by using a representation p which stores in 
memory the answers to each question in Q. Thus, to answer q,, CA., simply reads 
the l "* answer (see Elias and Flower [10]). 

If there is some known probability distribution P on ID, this induces a 
probability distribution P i on R. defined by 

P,(r) = 2 ?(<l) 
£/<D,(r) 



mmmmmm * 



m mmu t m* 



- i*g« 



where D,(r) » {c/ « Dl q,(</) « j^ jsaHD 

Theorem 30f 3 g^^^ p q ?f *|^^J 5 fjf r ^|- T I>>, and assume there 
is a probability distribution P on D. Define the entropy IN*,) by 

bus o^fi-iojj jjfi-tX yd 5n**o» *i J&riw bwaj^b sw T*Ja*db auotwq 9tl» ni 
jU-iX eaojJibi'ioD JKrtw lobnu ytsiol) ytomriflyns3t9 RsttSsw -<9)qsrta wh ni .«t)»t; 

(ii) There is some \$rmhmmJ^^NMmmim t v^^tm^.m 

noiiinifoG I> 



.Uit 'it\i I»fnm-i9Wb svsri sw nari) -,ml s ni inwata ri, i »di worM sw i lie to* \\ 

,Q uifirnob ym no 2iiou29up Jo 19* 'walqrnoj 6 imiol £iri3 9«nf« saw* ni ,««idT 






.(i)V> = (\>) ( <f yd tnnibb X. <~ G:,r oomnui **ii tadfirom t ;n *6 esri *birSw 

.$ ft (WjY* V«« 9w t jbl < i sol 

'jodW jHomolo ''i 25i to Duifiv sdj 03 no baqqsrn ii Q > Vj sasd elfib Hjg* ,ii,>dT 
,$t yd 93on3b aw dsiriw ,!9wjns i!un ■« fnutoi 03 (\j),f irt*w ow norij ,il>l < i 
,£,£ nomo2 ni tenoiinsm zew eA .(Q ,1) $ruvfcu («\ ,X>) msteye e i9biirtoD 
ilfiiX j9V9jri3B .f giuvloa ,i"> jfirii rt^orn aw ,229:06 tlstX e'jvsdb* ( ^* ^rlJ yez ow )i 
Vfifn aw .nicrnob rnsldoiq 9di ni bsm'fob ei jf tiguodjls jfsiwag nl ,?»W3R 
( (V>}<\ ) i y jurtJ yBi aw ,-ifilu3iJifiq m ;-nififr»ob Sftirtosm *ri) «i ,x» o) iftb'i yMcfiTiolni 

4C( (V»K),i)3) * j( uwi-i riesm ol A Hsj zsewar* 
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CHAPTER 4 

THE TABLE LOOKUP QUESTION SET 

In the previous chapter we discussed what is meant by Kraft storage and 
access. In this chapter we shall examine more closely under what conditions Kraft 
storage and Kraft access can be achieved. In particular, we consider the table 
lookup question set and attempt to understand the implications of Kraft storage and 
access and to get a feel for some storage-access tradeoffs. 

4.1 Definition 

If for all i we know the i th element in a list, then we have determined the list. 
Thus, in some sense this forms a complete set of questions on any domain ID, 
because answering these allows us to answer any other question. 

Definition. Define the tabic lookup question set 

T = {7,11 < 1 < maxlc/|} 

which has as its i th member the function 7 1 :D.-> X defined by y^d) = t/(i). 
For i > It/ 1, we say y^d) £ 0. 

Thus, each data base d * ID is mapped onto the value of its i ,h element. When 
i > It/I, then we want 7,(c/) to return a null answer, which we denote by 0. 

Consider a system {CI, p) solving (T, ID). As was mentioned in Section 3.3, 
if we say that 7, achieves Kraft access, we mean that (1, solving 7, achieves Kraft 
access. In general, although 7, is defined in the problem domain, we may 
informally refer to 7, in the machine domain; in particular, we say that 7,(/>(c/)) 
accesses cell k to mean that k <- {l(k\p{d) )]}. 
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Example 4.1. Recall Example 2.3, where D = {\) U X U A' 2 for X = {0,1}. Then 
we have, for instance, 

7l (J ) = 7,U) = = 7 2 (t/ ) 
y x (jj =7,(0) =0 
7 2 U,) =7 2 (0) = 
7 ,U 6 ) =7,(11) =1= 7 2 (c/ 6 ). 
Alternatively, we may informally write, using the representation p given in 

Example 2.7: 

7,Mc/ )) =7,(00) = = 7 2 (/j(c/ )) 

7,(/>U 6 )) =7,(200) =1 =y 2 (p{d 6 )) I 

If we are going to achieve Kraft access for all questions in the table lookup 
question set, then for 161 > 2 the ranges of all the questions must be the same. 

Theorem 4.1. Let 7 i? 7j * T and assume that 161 > 2. If y i and 7j both 
achieve Kraft access, then /?, = /?,, where /?, = /?(*/,( iD) ). 

Proo/V Consider a table lookup question 7 on p:\D -* 8 . Since iD = U X , 
where J c ||\| , then either /2( 7 ,(D)) =X or /2(-y,(ID)) = A 7 U {0}. Suppose 
|K,| >! I/Cjl. Then IflJ - |rt,| = ±1. By Theorem 3.12 we know that 

l/v,l - l/?J = n • (161 - 1) = ±1, and so the only solution is for 161 =2, n = ±1. 
Thus, if 161 > 2 we obtain a contradiction, proving that l/?,l = l/Jjl, which implies 
that K, - Kj. * 

It is easy to show that the condition 161 > 2 is necessary in the above theorem. 

Example 4.2. Let ID = X U X z where X = {a,b} and define the representation 
p:\D -> {0,1} + by: 

^(a) = 00_ /j(ab) =011 

/?(b) =10_ /»(ba) = 110 

/>(aa) =010 /»(bb) =111 



Then the table lookup question set T = {7,, 7 2 ) can be solved by algorithms wi 



th 
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access trees as shown in Figure 4.1. It is clear by observation of these trees that 
both 7 j and y 2 achieve Kraft access, and yet R x = X whereas R z = X U {0}. I 



(a) 7 




(b) 7, 




Figure 4.1. Access trees for y x and y 2 of Example 4.2. 

It immediately follows from the previous theorem that if we have Kraft access 
for the set of table lookup questions, and 161 > 2, then X * ID except when D = X n 
for some n. 

Theorem 4.2. Let |#l > 2. If all 7, i V achieve Kraft access, then either 
X € ID or else ID = X n for some n « IM + . 

Proof: Let ID / X n . Then there exist c/j, c/ 2 * D such that |e/,l < lc/ 2 l. So 

7 (t/,) =0 and /J(7 (D)) = X V {&}. Now assume that X ^ ID. Then 
lt/ 2 l lc/ 2 l 

/v(7j(ID)) = X. But by Theorem 4.1 this says that 7j and 7 can't both 

\d 2 \ 

achieve Kraft access, a contradiction. Therefore A ^ ID. I 



Thus, if i/j| > 2, then A <£ D implies that ID = X n for some n. Because 
R { = X U {0}, we know that if ID / X n , then * R v 

Corollary 4.2.1. Let \S\ > 2. If all y^Y achieve Kraft access and there is 
no n f INI * such that D = X n , then Riy^D)) = X U {0}, for all y i < V. 
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4.2 Kraft Access with Overlapping Access Sets 

In this section we discuss achieving Kraft access for the set of table lookup 
questions T and frequently refer to the set of memory cells accessed in order to 
answer some 7, £ Y. 



Definition. Let p be a representation p:\T> -» 8 , and tet y { , 7j * T. Then 
we say that y l and y, have overlapping access sets if, for some d * ID, 

{[7,( />(</))]} n {[7 j( />(«/)»} * 0. 

We shall show that, for l#| > 2, if all 7, t T achieve Kraft access then there can be 
no overlapping access sets (see Theorem 4.4). For the case 15! = 2, two access sets 
{C7j(/j(c/))]} and {ly^{ p{d))l] can overlap, but in at most one cell and only 
when X k £ ID, for all 1 < k < j (see Theorem 4.8). Where all 7, * V achieve Kraft 
access we also show that 

in 

2 #[?,( />(</))] <i/>(c/)i + in-i, 

1=1 

and if the 7, do not have overlapping access sets then 

in 

2uiy i {p{dm<\p{d)\. 
1=1 
(see corollaries 4.7.1 and 4.5.1.). 

Consider any representation p and suppose that y i ,y 2 ^T meet Kraft access. 

Our first theorem says that if 7 1 (/s(c/ 1 )) and 7 2 (/?(e/j)) access some cell in 

common, then ID does not include all strings of the form 

rfjfi) -R z 

or all strings R^ ■ </j( j). 

Theorem 4.3. Consider a representation p-'D -* 8 and let 7^ 7 , $ T each 
achieve Kraft access. Suppose there exists t/j £ ID such that 7j(/?(i/j)) and 
yip(d x )) access some cell in common. Then 
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-.(Vr * «j)(3rf 2 * ID)U 2 (i) = e/,(i) and rf 2 ( j) = r) 
and -.(Vr < /J,)(3c/ 2 < ID)U 2 (i) = rand c/ 2 (j) = c/^j)). 

Prop/".- For and c/ ( ( ID, lot /?(c/j) c ^ Suppose there is some d x t ID such that 
TiMt/j)) and 7j(/?(c/j)) both access cell k. Let m^k) = bj < '■ 8. Since 

7;, 7j £ r achieve Kraft access and access cell k then, for all t/ 2 * ID, 

c/ 2 (i) = c/ 1 ( i) => w 2 (k) = bj 

rf 2 (j) =rfi(j) * w 2 (k) = b,. 

Since 7, achieves Kraft access, we know there is some string d 3 * ID such that 

w 3 (k) * bj, and 7, accesses cell k. So there is no way to represent a string d A 

where 

c/ 4 (i) =c/j(i) and </ 4 (j) =c/ 3 (j) 
Similarly, there is no way to represent a string c/ 5 where 

</ 5 (i) =c/ 3 (i) and rf 5 (j) =c/,(j). I 

The intuition behind the preceding theorem can perhaps best be seen by picturing 
the access trees for two table lookup questions, as we do in the following example. 
This gives us an example of overlapping storage, although we obviously can't 
represent all strings in the product of the ranges. 

Example 4.3. Let 8 = {0,1,2}, and let X = {x ; | 1 < i < 9}; i.e., LSI = 3 and \X\ = 9. 
Suppose 7j and 7 2 have the ternary access trees as shown in Figure 4.2 and 
therefore achieve Kraft access. These trees indicate that, for instance, 
p{\ 2 - x 6 ) = 0121_ and p{\ 8 - x 5 ) =12202. The only time we have overlapping 
access sets is for d * ID such that c/(l) = x 6 , x 7 , or x 8 ; i.e., y x ( p{d)) = x 6 , x 7 , 
or x 8 . So we can certainly represent any pairs of strings in Xj- X, where 
Xj <£ ix e ,X7,x 8 }. It is also possible to represent the pairs of strings x 6 - x p x 7 - x 2 , 
and x 8 - Xj where x, £ {xj,x 2 }. I 



ss 




x 6 x ? x g 




Figure 4.2. Access Trees for 7j and y 2 of Example 4.3. 



So if 7, and 7^ do overlap in access of cell k, then it is not possible for iD to 
include a string t/ p such that p{d x (i)) has some value b x in cell k and p{d x {])) 
has some value b 2 * b x m cell k. If 7, meets Kraft access, then its access tree is 
full, so there will be at least \8\ elements d * ID such that y { {p(d)) accesses cell k. 
Similarly for y^ Let S be the set of strings in the domain that agree with c/j in 
every position except the j th : 

S = {</ t ID I rf(n) -c/j(n) for all n * j}. 
Then IS J < \Rj - ( |#| - 1), since there must be at least \8\ - 1 characters r in R j 
such that we cannot represent any string in X* whose i th component is c/j(i) and 



.th 



whose j component is r. 



Lemma 4.1. Consider any representation p--\D -* 8^ and let y^y.^T 
achieve Kraft access. Suppose that for d i t ID, 7, and y. access some cell in 
common. Then 

I U {c/(j)}i<ik,i-isi + i. 

Proof: 7 i (/ 7 (c/ ] )) and 7 J (/>(c/ 1 )) access some cell in common. Since 7, meets 
Kraft access, then for p{d x ) c m v y^{p{d x )) corresponds to m,(k) = b t 8. 
Since 7j also meets Kraft access, there are at least \8\ - 1 values for c/(j) that do 
not have m{k) = b. Thus, I U {d{ j) } I < \R J - |g| + 1. | 
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Recall that for a pair of table lookup questions 7, and 7,, where 1 < j, then 
R i ~ /?j if 7, and 7, achieve Kraft access and \&\ > 2. If d{\) = x t X, then all we 
know is that c/(j) t X U {#}. On the other hand, we know that if c/(i) = 0, then 
t/(j) = 0; in this case there are lA'l combinations of c/(i) and c/(j) that do not 
exist for any d ( ID. So perhaps there could be some representation scheme that 
would allow us to overlap accesses. The next theorem follows from Lemma 4.1 and 
shows that there is no such scheme. 

Theorem 4.4. Consider a representation p-D -> # + , where \8\ > 2, and let 
7 j; 7j * T each achieve Kraft access. Then, for all d * ID, y^pid)) and 
7j(/ 1 (f/)) access no cells in common. 

Proof: Assume there exists d x < ID such that 7 i (/?(c/ 1 )) and yjtpidj) each 

access cell k, i < j. Then since all 7, achieve Kraft access, for all b * 5 there is 

some t/ 2 <• ID such that 7 1 (/'U 2 )) causes cell k to be accessed and »i 2 (k) = b, 

where p{d z ) Q m 2 . Since not all leaf descendants of node k in the access tree for 

7j can be labelled 0, there is some d 3 ( ID such that t/ 3 (i) * and y { (p{d 3 )) 

accesses cell k. If we let IDj = {«/ « ID I c/(i) = c/ 3 (i)}, then we have 

I U (c/(j)} I < \R.\ - 161 + 1 < \X\ + 1 - ISI < U'l. 
c/(ID, J 

But by the way we have defined a problem domain, there are \X\ data bases d t ID 
that differ from d 3 only in the j th position. This gives a contradiction and so, for 
all d t ID, y^pid)) and 7 .{ p{d)) do not have overlapping access sets. 1 

Since for any d ( ID each 7j accesses a distinct set of cells, the total number of 
accesses made by the various 7's cannot be more than \p(d)\. 

Theorem 4.5. Consider any representation p- ID -» 8 and assume all y { <- Y 
achieve Kraft access. If 7. and 7j access no cells in common, then 

in 

2 UZy^pid))! <\p[d)\. 

i=l 
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From theorems 4.4 and 4.5 we can immediately get the following result. 

Corollary 4.5.1. Consider any representation p- ID -* 8 , where \8\ > 2, and 
let all 7, * r achieve Kraft access. Then 

in 

I VlyXp(d))] <\p(d)\. 
i=l 

Unfortunately, Theorem 4.4 does not hold for 131 = 2. In other words, it is 
possible for 7, and 7, to achieve Kraft access and also access some cell in common. 

Example 4.4. Let 8 = {0,1}, X = {a,b}, and ID = {A} U X z U X 3 . Consider the 
representation p-\D -» 8 defined as follows: 

<L ML 

\ 10_0_ 

a a 0100_ 

ab 0110_ 

ba 1100_ 

bb 1110_ 

aaa 01010 

aab 01011 

aba OHIO 

abb 01111 

baa 11010 

bab 11011 

bba 11110 

bbb 11111 

Since A € ID, R x - {a,b,#} for i * {1,2,3}. Possible access trees for y v y 2 , y 3 are 

shown in Figure 4.3. Notice that 7; and y z may both access cell 1, and we have 

the following storage allocation: 




Without altering the access trees, we could extend p and ID so as to also include the 
element a <■ X, by letting /3(a) = 00_0. It would not, however, be possible to 
similarly include b in the domain, because p{b) would require cell 1 to be set to 1 



and also to 0. 



S8 




b 





Figure 4.3. Access trees for y v y z , y 3 of Example 4.4. 



We can see that Corollary 4.5.1 docs not hold for 151 = 2, since for d = bab, 
/i(bab) = 11011 and: 

3 

2 #[7,(11011)] = 2 + 2 + 2 = 6 > l/»(bab)|. 
Notice also that p does not achieve Kraft storage: 






= 4 -2- 4 + 2" 3 + 8 -2" 



< 1. 



The following; lemma shows for |g| = 2 that if 7, and 7j each have Kraft 
access, and if they both access cell k, then the access trees for 7, and y j each have 
a node labelled k leading to a leaf via a branch labelled b * Q. 

Lemma 4.2, Let |$l = 2 and let b, b' * $, b * b'. Consider a representation 
/?:ID -» # + , and assume that 7 p 7 , * T achieve Kraft access and that 

h(U {[ 7i ( /»((/))]} n U {[7 ,(/>(</))]})■ 

Choose elements x,,x 2 < /J, and x 3> x 4 * flj, such that Mj(k) = b, 
w 2 (k) =b', w 3 (k) = b, m 4 (k) = b", where ra, 2 ?Jx,). Then either 
\ 1 =x 3 = 0orx 2 '=x 4 = jzf. 

Prco/V Clearly /? cannot represent a string c/j where c/j(i) = x j and c/j( j) = x 4 or 
a string J z where c/ 2 (i) = x 2 and c/ 2 (j) = x 3 . There are two cases to consider: 
(t) If Xj * X then x 4 = 0, since we do not necessarily need to represent d{i) * X 
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and r/(j) = 0, but we must be able to represent c/(i) * X and c/(j) ( X. This tells 
us that x 3 * and so x 3 * X. Since we cannot represent c/j, then x 2 = 0. 
(n) If x. = 0, then x 4 * A' and x 2 * X. Since we cannot represent e/ lt then 
x 3 = 0. » 

In Example 4.4, the access sets for 7j and y 2 each included the cell 1. Notice that 
in each of their access trees, the left branch from the node labelled 1 led to the leaf 
0; using the terminology of Lemma 4.2, x t = x 3 = 0. 

Lemma 4.2 allows us to prove that at most one cell can be in two access sets, if 
we achieve Kraft access. 

Theorem 4.6. Assume y { , y , * T achieve Kraft access. Then the access sets 
for 7 ( and 7, contain at most one cell in common. 

Proof: If 7, and 7, access two cells in common then by Lemma 4.2 each tree has 
two leaves 0, which violates our assumption of Kraft access. i 

We can, in fact, make the even stronger statement that if we achieve Kraft access 
for all of T then any table lookup question y i * T can access only one cell that any 
other 7, <: r accesses. The following theorem formalizes this. 

4* 

Theorem 4.7. Consider a representation p--\D -» 8 , and assume that all 
7j (■ T achieve Kraft access. If 7j, y ^ both access cell kj and y [} 7 k both 
access cell k 2 , then k ; = k 2 . 

Proof: By Lemma 4.2, we know that node kj in 7 ( 's access tree leads to a leaf 
labelled 0. But also node k 2 in the tree for y i must lead to a leaf 0. Since 7, 
achieves Kraft access, there can be at most one leaf labelled 0, and so k j = k 2 . I 

This gives us a result similar to Theorem 4.5, for the case where we allow access 
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overlap. 

Corollary 4.7.1. Consider any representation p-\D -> 8 and assume that all 
7, * T achieve Kraft access. If we allow access overlap, then 

in 

2 #[7,( M)i <\p{d)\ + in -1. 

1=1 

Proof: From Theorem 4.5 we recall that where there is no access overlap, then 

in 

luly^pid))-} <\ P {d)\. 
1=1 
Now from Theorem 4.7 we know that each 7; can have at most one cell in common 

with any other 7- So 

in 

2>[7i(/>(c/))] < \p{d)\*\T\ -1. I 

1=1 

Example 4.5. Recall Example 4.4, where 

3 

2#[7;(/>(bab))3 = 2 + 2 + 2 =6 < l/?(bab)l + ID - 1 = 5 + 3 - 1 = 7. I 

i=l 
The next example verifies that, in fact, the bound in the above corollary is the best 
possible. We achieve this bound when all y i T access some cell in common. 

Example 4.6. Let 8 = {0,1}, X = {a,b}, and ID = {x} U X 3 . Consider the 
representation p-\D ~> 8 defined as follows: 

X 10_0 

aaa 0101 

aab 0100 

aba 0111 

abb 0110 

baa 1101 

bab 1100 

bba 1111 

bbb 1110 

Consider the access trees for y v y 2 , y 3 shown in Figure 4.4. Then it is easy to see 

that 



In particular, 



and 
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ZttlyXp(d))! <\p(d)\ + in -1. 



i=i 



2#[7i(/>U))] =S <3 + 3 -1 



i=l 
3 



2tilyAp(bb<i))l =6<4 + 3-l. 



1=1 




b 





Figure 4.4. Access trees for y x , y 2 , 7 3 of Example 4.6. 

Essentially, we were able to allow access overlap in Example 4.4 because we 
did not need to represent the string's 3.00 or b00. This was because we restricted 
ID so that X* <£ ID. If it is necessary, however, to represent the situation where 
y^pid)) * and y{p{d)) = 0, then no overlap between 7j and y z is possible. 
In fact, for 151 = 2, this works in both directions, as the next theorem shows. 

Theorem 4.8. Let 131 = 2 and let y t , y, * T each achieve Kraft access. There 
exists a representation p:\D -» 8 such that y { and 7, access some cell in 
common if and only if X £ D for all i < k < j. 



Proof: ( =*> ) As in the proof of Theorem 4.4, we can assume without loss of 
generality that y^pidj) * 0. Then \R(yJ,d^))\ = \X\ + 1. But by Lemma 4.1, 
if 7 ( and 7, access some cell in common, then yip{d)) can take on at most 

\r} - \s\ + 1 < m + 2 - 101 < \x\ < m + 1 

values. So y { and 7, access no cells in common. 
( <= ) If there exists no k such that i < k < j, then 

?,(/>( e/)) < X =>7jMc/)) < X 
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and y^pid)) = -^y^pid)) = 0. 

We can always construct a representation p such that y { and 7j will both access 
some cell k. Let the access tree for y { have exactly one node corresponding to an 
access of cell k, and let this node be at a greater depth than any other nonleaf 
node. Let the left branch from this node lead to a leaf labeled and the right 
branch lead to some other leaf Xj * X. Then construct the access tree for 7j such 
that the root is labeled k, and its left branch leads directly to a leaf labeled 0. 
This allows us to represent all strings X ■ X and ■ 0, and yet y { and 7j both 
access cell k. ' 
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4.3 Achieving Kraft Storage and Kraft Access 

We have seen in Example 4.4 that it is possible to have Kraft access with 
overlapping access sets, although that particular representation did not achieve 
Kraft storage. This leads us to wonder whether it is even possible to achieve both 
Kraft storage and Kraft access; the following example shows us that it is. 

Example 4.7. Let 8 = {0,1}, X = {a,b}, D = {x} U X 2 , and define p:T> -* S f by 

d_ ML 

\ 

aa 100 

ab 101 

ba 110 

bb 111 

Now consider the access trees for 7j and y 2 as shown in Figure .4.4. Clearly y : 

and y z each achieve Kraft access. It is also the case, however, that p achieves 

Kraft storage, since 

2 2 " 1/,U)l =2- 1 + 4.2- 3 = l 
c/*ID 
Now notice that 

2 

2 #r_7i(/>Ub))] = #[7j(/»(ab))] + #[7 2 Mab))] = 2 + 2 = 4 > \p(d)\ 
and so Corollary 4.5.1 does not hold for \8\ = 2, even when we achieve both Kraft 



storage and Kraft access. 



I 



The main results of this section, theorems 4.9 and 4.10, tell us that if we achieve 
both Kraft storage and Kraft access then our domain must be of the form ID = X 
or ID = U) U A' n . 

We are now in a position to prove our first of two mam results of this section: 
if we are to have Kraft storage and access and not allow overlapping access sets, 
then ID = X n . We first prove the following lemma. 
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(a) 7 , 




Figure 4.4. Access trees corresponding to y j and y z of Example 4.7. 

Lemma 4.3. Consider a representation /j:ID -> # + and assume that all 7, <- T 
achieve Kraft access. Then, for k. < in, 



(4.1) 



s*/r 

where R k & R^ . « 2 . . . . . R w 

Proof: We prove this result by induction on k. 

Basis: Since y x achieves Kraft access, by Theorem 3.7 we have 



stR' 



s*R ] 



Induction step: Let /? k+1 = {r lt r 2> . . . ,r n }, and assume that (4.1) holds for /? k . 
Then 



sffi 



I \s\~- ULyiip{%))1 = I ^^(/'W^^^Ms))] 

s<« k - r 2 



s<« k - r 



Since 7 k+1 achieves Kraft access, then for r (- fi k and r * R, we have 

tf^K+jMr-r,))] =a ktl (r,) 
This gives us 



6S 



2 ls f^" (s))] .^.A.lj^l'l" 



s 6/? k+1 s*/? k k 



s<7? k 



Bv our inductive assumption and since we are given that 7 k+1 achieves Kraft access 
for k + 1 < in, this becomes: 



5 «« k *' 



We now prove our desired theorem. 



Theorem 4.9. Consider a representation p-\D -» 3 which achieves Kraft 
storage and assume that all y^ T achieve Kraft access. If for all y { , 7j t T 



U {C7,(/^rf))]} n U {Cy,(/»(rf))3} = tf, 
c/*iD c/OD J 



then ID = X n . 



in 

Pjolt/": Let /?; denote the set of strings of length in, where each element is 

in 

chosen from /? ( . Define the one-to-one function g=D -» R i by 

g(rf) = y^{d)-y z {d)-...y id). 
Assume that X « ID. Then, by Corollary 4.2.1, /?(?,( D)) = X U {0} for all 1. But 

in 

for all 7, * r, 7 1 (/?(c/)) = implies that 7,(/*(c/)) = 0. So g(D) * /?, since, 

iri-i 

e.g., 0- X <£g(ID). Because we have Kraft access and no overlapping access 

sets > ,p, 

v -\p(d)\ v , -2«[7,( /»(«/)) 3 u , ru 

2 l/il < 2 ISI '" ' by Iheorem4.5 

- -luly.ipU)) 1 . 

Z l#l since g is 1-1 

s«g(D) 
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in 



< 2 \8\ * ' since g( ID) cR, 

= 1 by Lemma 4.3. 

This gives a contradiction, since we know that p achieves Kraft storage. So X * ID, 
which by Theorem 4.2 says that iD = X n . I 



As we saw in Example 4.7, the condition that there be no access set overlap is 
necessary in the above theorem. 

From theorems 4.9 and 4.4, we have the following corollary. 

Corollary 4.9.1. Let |#l > 2, and consider a representation p-\D -* 8 which 

achieves Kraft storage. If all y i £ Y achieve Kraft access, then ID = X . 

k 
Because we shall frequently consider domains of the form D = UX , it is worth 

1 = 

noting that with a domain in this form, it is not possible to attain both Kraft 
storage and Kraft access. 

k 

Corollary 4.9.2. Let ID - Ux 1 , for k > 0, and consider a representation 

1 = 

p--\D -> 8 . Assume that all y^ € V achieve Kraft access. Then p does not 
achieve Kraft storage. 

Although we have proved that Kraft access, Kraft storage, and no access set 
overlap implies that ID = X n , we know by Example 4.7 that it is also possible to 
have, for some domain ID / X n , both Kraft storage and access with access overlap. 
Example 4.7 is not an isolated case; i.e., the next example illustrates that it is not 
necessary that \X\ = 2 or that ID = {x} U X 2 . 

Example 4.8. Let 8 = {0,1}, X = {a,b,c,d}, ID = {x} U A' 3 , and define p:\D -* 8 f 
as indicated in Figure 4.5. Such a definition is possible because only cell is in two 
access sets, and w(0) = 1 for all d * D except d = X. For instance, 
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d_ 

\ 

aaa 

aba 

aca 

ada 

bac 

bbc 

bcc 

bdc 

cad 

cbd 

ccd 

cdd 



ML 

o 

10__0011 
10__0111 
10__1011 
10__1111 

110JOOOO 
110_0100 

ilOJOOO 
110JL100 
11100010 
11100110 
11101010 
11101110 



This system has overlapping access sets and achieves Kraft access. In fact, we also 
have Kraft storage, since 

+ 4 2 -2~ 7 + 2-4 2 -2~ 8 = 1. I 



2 2 



wl .r'M'.!- 






Figure 4.5. Access trees corresponding to 7j, y 2 , and y 3 of Example 4.8. 



Now we want to determine for what possible domains ID we can get Kraft 
storage and access if we allow overlapping access sets. Certainly we know that 
151 = 2, and recalling examples 4.7 and 4.8 we might suppose that ID is of the form 
{A} U A' n , as is indeed the case. 
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Lpmma 4.4. Let ISI = 2 and consider a representation p-\D -* 8 which 

achieves Kraft storage. Assume that in > 1 and 7i,7j * T achieve Kraft 

access, and 

U {C7,(/H </))]} n U {C7,(/'(rf))]} = {k}. 
c/«ID c/CID J 

Then the access trees for 7, and 7, each have root node labelled k. 

Proof: Let i < j. By Lemma 4.2, we know that t /?, and * ftj. Assume that 
the access tree for 7, has root with label tj * k and that the access tree for 7j has 
root t 2 . Without loss of generality, let the leaf in tree y l with label have t, = 0; 
i.e., p{ 0) has /n(tj) = 0. Since t : ^ k the node k must be a descendant of tj, and 
there exists x, * X such that pix^) also has m{l x ) = 0. Clearly there is some 
x 2 * X such that p{x z ) has w(tj) = 1. From Lemma 4.2 we know that in the tree 
7, we must also have the leaf a descendant of node k, with wi(k) = 0. Thus 
t/j $ ID, where c/j(i) = Xj and c/ 1 ( j) = since p would require setting w(k) = 1 
and wi(k) = 0. Since 7,, 7, achieve Kraft access, then by Theorem 4.8 it must be 
the case that X p $ ID for i < p < j. On the other hand, we know that p docs 
achieve Kraft storage. So by Theorem 3.3 there is some d z t ID such that 
<J z (i) = x 2 and t/ 2 (j) = 0, which contradicts the fact that X p g ID for 1 < p < j. 
Thus, tj = k and we can similarly show that t 2 = k. I 

This lemma allows us to prove our second main result of the section: If we have 
Kraft access, Kraft storage, and access overlap, then ID = {\} U X n . 

A, 

Theorem 4.10. Consider a representation p-D -> 8 which achieves Kraft 

storage, and assume that all 7, « T achieve Kraft access. If there exist 

7., 7 . d T such that 7 j and 7, have overlapping access sets, then 
ID - {X} I) X n . 
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Proof: Assume y i and 7, both access cell k, and assume there is some 7 m ^ T that 
does not access cell k. Then we can represent c/j(i) = 0, c/j(j) = 0, J : {m) $ X, 
indicating we don't have Kraft storage, a contradiction. So if y t and 7, both access 
cell k, then for all 7 m ( T, 7 accesses cell k. By Lemma 4.4, 7 m has root node k 

in 

with one branch to leaf 0. I hus, we can represent exactly the strings and 

in 

A' , and so ID = {a} U A' n . I 

In Theorem 4.5 we showed that if we meet Kraft access and have no access set 
overlap, then \p{d)\ is an upper bound on the total number of accesses made in 
reading all the elements in p(d). We now show that for any \&\ > 2, if we achieve 
Kraft storage then every cell must be accessed in answering some question 7 r Thus 
\p{d)\ is a lower bound on the total number of accesses to read p{d). 

Theorem 4.11. If the representation p:\D ■* 8 achieves Kraft storage, then 
for all d t ID: 



kt D(p(d)) =>k< U {lyXp(d))l} 

7,<r 

Proof: We define S to be the set of cells accessed by asking of some d x ( ID each 
of the questions 7 ( : S= U {Ly^pid l ))l}. We want to prove that 

k « D( /»((/,)) *k « S. 



4* 
Define the representation p x -D -> 8 by: 



p x u) = < 



p{d) for d * d x 



{(k,w(k)) I k*S} for d = tl x 
Then p x is a representation because p is: for d 2 , d 3 £ ID where d z * d v d z * d x , 
we have 

p x {d 2 ) n p : (d 3 ) * => p{d 2 ) p{d 3 ) * => d 2 = d 3 , 
and for d z t iD, t/^ * c/j, we have 

- Px {d 2 ) n ^(t/j) * *(Vy,< D(7 i (/'(c/ 1 )) = 7,(/>U 2 ))) *t/ 2 = t/ r 
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Assume there exists k * D( /?(</,)) such that k <£ S. Then l/Jjd/j)! = IS I < \p(tl 1 )\ 
c/eiD t/«D-{c/,} 

v , MM s,M d ^ 

c/<ID-{c/,} 

- 2 a"""-! 

eKID 
This violates the fact that p achieves Kraft storage, so k t D{p(d { )) *> k * S. I 



Corollary 4.11.1. If the representation p-\D -* # + achieves Kraft storage, then 
for all d ID: 

in 

luZy^pid))! >\p{d)\. 

i=l 

From theorems 4.S and 4.11, we immediately have the following result. 

Theorem 4.12. Consider a representation p-'D -» 8 which achieves Kraft 
storage and assume that all y i ( T achieve Kraft access. If there is no access 
set overlap, then for all d £ ID: 

in 

2 #[7,(/»(rf))] = \p(d)\. 

1=1 

n 

Since we are in general considering list problems where ID = U A' }, Theorem 4.9 

1 = 
holds for the cases of particular interest to us. 

n 
Corollary 4.12.1. If ID = U X 1 and the representation p-\D -> # + achieves 

i = 

Kraft storage, and all y i ( T achieve Kraft access, then for all d * D: 

in 

2 #[7,(/>(c/))] = \p(d)\. 
n 

Thus, for list problems where ID = U X 1 , if all y { * V achieve Kraft access then y i 

1 = 

and 7, access no cells in common. 
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4.4 Storage Consequences of Kraft Access 

We conclude this chapter by examining some consequences of Kraft access for 
the set of table lookup questions. In particular, achieving Kraft access tells us 
something about the minimum arid maximum possible values of \p{d)\- 

max|/>(J)l > in ■ (Hog, \R[\ - 1) 

cKD 161 

and mm|/>(c/)| > IJI - 1. 

c/CID 
In general wc have even better bounds. 

In order to lower bound \p(d)\, we first prove two lemmas. 

Lemma 4.5. Let p-\D -» 8 be any representation. Then 

(V 7 , * D(3J * D)(#[y,Mc/))] > riog i«,n). 

Proof: By Theorem 3.13, maxair) > Hog, I/?,|1 

r««, ISI 

so max//[7,(/?(c/))] > Nog, 1/2,11, 

d*\D 161 

and this immediately gives our desired result. I 



Lemma 4.6. Let ID = U A' 1 and let p-\D -* S + be any representation. Then 
(Id. * D)(V7, « V){ULyXp(d.m > Nog 1/2,11). 

Proof: Let d x * ID be the database defined as follows: 

«/. * {c/,(i) = r, I (r, * X) A (a,(r,) = max o,(r)) A (0 > i < iri)} 

K/2, 

It is always possible to define such a d y Now recalling Theorem 3.13, 

tllyXpid.m = max a,(r) > Hog, 1/2,11. 
r</2, 161 

We now show it is always the case that 

maxi/»(j)i > in -riog, ,1/211 -in + i 

dt\D 161 

and almost always the case that 

maxl/»(c/)l > in -Hog, ,1/211. 
t/«D 161 
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Theorem 4.13. Let ID = U X 1 and p:\D -> # + be any representation. Assume 
that all 7, ( T achieve Kraft access. Then we can conclude the following, 
where we write \R\ to denote mini/? J. 

(a) max|/»(c/)l > IIT (Hog, 1/211 - 1) + 1 
c/<!D ISI 

(b) If there are no overlapping access sets for 7 ( * T or if there is no j t INI 
such that IXl = 2 J , then 

maxl/>(c/)| > in -Hog, ,l/?n. 

/>>■«>/.■ (a) By Corollary 4.7.1, 

in 

2«[7 1 (/'(rf))3>i/'(rf)i + iri -l. 

1=1 

From Lemma 4.6, there exists ac/j ( D such that 

in in 

ItXy^pidJ)! > 2 riog i«,n. 

i=l i-1 lol 

Combining these, we get 

in 

1 i-i isl ' 

> in • (riog, ,i/?n - 1) + i 

isl 

and so max|/»(rf)l > iri- riog l/?H - 1) + 1 

c/ciD 1^1 

(b) (i) If there are no overlapping access sets, then Theorem 4.5 tells us that 

in 

2uly i (p(d))l<\pld)\, 
i=i 
and so we conclude that 

max|/?(c/)l > in- riog, l«n. 

dm isi 

(ii) If we do have overlapping access sets, then by Theorem 4.4 we know that 
I SI = 2 and by Lemma 4.2 we know \R\ = \X\ + 1. Assume there exists j * IM + such 
that 2 J < U'l < 2 J+1 . So in each 7, tree there is some x, * X which labels a leaf at 
depth j + 1. Now define d x * ID so that d^i) = x, for all 1 < i < ID. Then 

!/»(</,)! >u + n- in. 

Since r,0g |5| l/?l1 = j + lf 

we have maxl/»(c/)l > 1171- Hog |/?|1 I 

c/^lD ISI 
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In the following example we verify that if we allow overlapping access sets for 
U' I = 2 , then we may have 

max|/>(c/)|<iri-riog \R\\ 

urn isi 

Also, the bound in Theorem 4.13(a) is tight. 

Example 4.9. (a) In Example 4.8 we clearly have 

maxl/»(c/)| = 8 < 3- Mog-51 = iri- Hog \R\1, 
dW 2 \8\ 

since p{d) only occupies cells in the set {0,1,2,3,4,5,6,7}. Note, however, that 

max|/?(c/)| =8 >3- (Mog-51 -1). 

(b) In Example 4.7 we have 

maxl/?(c/)l =3 < 2-riog 9 31 =4. 
cKID 
However, since 

max|/?(c/)| =3 >2- (riog-31 -1) =2, 

the bound in Theorem 4.13(a) is best possible. I 

On the other hand, it is sometimes the case that we have overlapping access sets, 
\X\ = 2\ and also 

maxi/»u)i > in-riog i/?n. 
dm isi 

Example 4.10 illustrates this. 

Example 4.10. Let 8 = {0,1}, X = {a,b,c,d}, and ID = {x} U X z . There exists (as 

the reader may verify) a a representation p--D -> 3 + such that the trees shown in 

Figure 4.6 implement 7 j and y z , respectively. For instance, 

pU) =10_00__ 

pied) = 10J111 

/j(ca) =10_10 

p{bc) =0_1110 

In this case 

max !/>(</) I =6 = in-riog \R\1. I 

e/€D 131 

Let us now say something about the minimum size a representation can have. In 



-74 




C 




Figure 4.6. Trees for y % and y 2 of Example 4.10. 

particular, it is always the case that \p{d)\ > IJI - 1. Where there is no access set 
overlap, then it follows from Theorem 4.5 that \p{d)\ > \T\. 

Theorem 4.14. Let ID = U A' 1 and let p-\D -* 8 be some representation. 

i*J 
Assume that all 7, S V achieve Kraft access. 

(a) Then for all d t D: 

\p(d)\ > IJI -1. 

(b) If there are no overlapping' access sets, then for all d * ID: 

\p(d)\>\n 



Proof: (b) By Theorem 3.9, Uly^{p{d) )] > 1, and so if there is no access 
overlap, then Theorem 4.S tells us that 

in in 

\p{d)\ > IuL yi (p(d))i> 2 i = in. 

i=l 1=1 

(a) On the other hand, suppose we allow overlapping access sets. If 
\p{d)\ < IJI - 1, then there are at most IJI - 2 root node labels. So for j * IM + not 
all of the access trees 7, where j * J can have distinct root node labels. Pick 
J1J2 ^ Jj jj < j 2 > SLJC b 'bat 7j and 7: have the same root node label. Then by 

J 1 J2 



vk 



Jl 



Theorem 4.8, A' K £ ID for any jj < k <j 2 . But we know that X l c ID, a 
contradiction. Therefore, \p{d)\ > IJI - 1. I 



7S 



Example 4.11 shows us that the bound in Theorem 4.14 is best possible. 

Example 4.11. Let 8 = {0,1}, X = {a,b}, and ID = U X 1 , for J = {0,3,5,6}. 

Consider the representation p-D -* 8 that corresponds to the set of access trees 
shown in Figure 4.7. Then all 7, * T achieve Kraft access, and 

l/>U)l=3 = |J|-l. I 








a b 




Figure 4.7. Access trees for 7j, 7 2 , 7 3 of Example 4.11. 



Note that the bound in Theorem 4.14b may also apply to a table lookup question set 

that has overlapping' access sets; recall Example 4.4. 

From Theorem 4.14 it immediately follows that if all 7, < T achieve Kraft 

access and maxlc/l is unbounded, then infinite storage is required to represent each 

c/*ID 
J <J ID. 



Corollary 4.14.1. Let p-\D -» 8 be any representation, and assume that all 
7j * r achieve Kraft access. If 
t/« ID, ->(3k 2 « IN)(l/>(c/)l <k 2 ). 



7, « T achieve Kraft access. If -i(3k, « iN )( maxlc/l < kj, then for all 
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CHAPTER 5 

IMPLEMENTING THE TABLE LOOKUP QUESTION SET 

In Chapter 4 we discussed the set T of table lookup questions and 
consequences of achieving Kraft access for each 7; * T. In this chapter we 
introduce three major classes of representation schemes and then examine the table 
lookup question set in the contexts of these three basic representations: fixed length, 
endmarker, and pointer. The fixed length representation was chosen because it 
sometimes allows us to achieve both Kraft storage and Kraft access. The endmarker 
and pointer representations were chosen because they illustrate techniques commonly 
used for implementing variable length lists. In Chapter 6 we reconsider these 
representations in order to implement stacks. 

5.1 Classes of Representations 

In this section we briefly discuss some basic definitions and representation 
techniques and thereby motivate the formal definitions for fixed length, 
endmarker, and pointer representations, which are presented formally in sections 
5.2, S.3, and 5.4, respectively. 

We begin with two notational definitions. 

Definition. Consider a function b $ 8 and recall that 

b = {(n, m b {n)) I n « D{b)}, 
where b c m^. For k £ IN , we define 

{6} k e {(n + k, w ft (n)) ln« D{b)). 

Thus, {b} k is the set b $ 8 f "displaced" by k, as illustrated in the following 
example. 
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Example 5.1. Consider a function f:S -» {0,1} and let s, ( S. If 

f(s,) ={(1,0), (3,1), (5,0), (6,1)}, 
then {f(s,)} = f(s,) 

and {f(s,)} 2 = {(3,0), (5,1), (7,0), (8,1)}. t 

Also, we shall frequently have occasion to refer to the concatenation of two strings 
in 8*. 

Definition. Let f j be a function f,:S -> 8*, let f 2 be a function f 2 :S -» 8*, 
and let s,, s 2 £ S. We write fj(sj)- f 2 (s 2 ) to denote the concatenation of the 
strings fj(sj) and f 2 (s 2 ), where 

f t ( Sl )- f 2 (s 2 ) fe f,(s,) U{f 2 (s 2 )} |fi(Si)| (6* 

Thus, !f,(s,)-f 2 (s 2 )| = lfi(s,)l + lf 2 (s 2 )l, 

and D(f 1 (s 1 )-f 2 (s 2 )) ={0,1,... , If^s^i + lf 2 (s 2 )l - 1}. 

Notice that when f,(sj) = X, then |fj(sj)l = and f ,(sj) • f 2 ($ 2 ) = f 2 (s 2 ); in 

particular, X- A = X. In an obvious way, the definition can be extended to the 

concatenation of any countable number of strings. 

Example 5.2. Define the function f:{a,b,c} -» {0,1}* by 

f(a) =0 
f(b) =10 
f(c) =11 

Then f(a)-f(b) = {(0,0) } U {(0,1) , (1,0)}, 

= {(0,0), (1,1), (2,0)} =010 
and f(c)-f(c) ={0,1), (1,1)} U{(0,1), (1,1)} 2 

= {(0,1), (1,1), (2,1), (3,1)} =1111. I 

Many commonly used representation schemes involve the concatenation of 
encodings of a set X. For instance, given a function f:X -» 8* it would seem 
natural to encode x,x 2 . . . x k < A' k as f(x,) • f(x 2 ) ■ . . . • f(x k ). Similarly, we 
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could encode >' k by placing each of f(x, ),..., f(x k ) into a fixed field. Wc 
illustrate these schemes in Example 5.3a and 5.3b. 

Example 5.3. Let X = {a,b,c,d}, 8 = {0,1}, and consider a function f:X -» 6* 
defined by 

f(a) =00 

f(b) =010 

f(c) =011 

f(d) =10 

Assume that the domain is of the form ID = Ux\ and we want to define a 

*J 

mapping from ID to 8 

(a) Consider the function f^lD -* 8*, where 

f,(rf) =f(rf(l))-f(t/(2))-...-f((/(lrfl)). 
For instance 

fj(abad) =f(a)-f(b)-f(a)-f(d) =000100010 
fj(bdb) =f(b)-f(d)-f(b) =01010010 
fj(X) = A 
Notice that, for IJI > 1, i\ is not a representation because there is no way to 
recognize the end of the string fj(c/), e.g., f,(b) and f,(ba) are indistinguishable 
since f,(b) c fj(ba). 

(b) Consider the function f 2 :ID -» # + , where 

Idl 

f 2 (rf) = U{f(c/(i))} 3(1 . 1) 

i=l 

Then 

f 2 (abad) = 00_01000_10_ 
f 2 (bdb) =01010_010 
f 2 ( X) = X. 

As in the case of f p the function f 2 is a representation if and only if IJI = 1. I 

In the previous example, f, and f 2 would be representations, even for IJI > 1, if 
there were some way of detecting the ends of codewords fj(c/) and f 2 (c/). In 
particular, we might reserve some symbol to mark the end of the list or we might 
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give some specification of the length lt/| or the length \p[d)\. 

Many representations that we consider are what we call 
concatenation-preserving, where the encoding of a list includes the encodings of the 
individual elements in the list. We now generalize the familiar notion of 
concatenation of encodings of list elements to not necessarily imply a "left to right" 
ordering', only that the encodings are in disjoint sets of memory cells. Thus, if we 
know where to look then it is possible to determine c/(i) and obtain no information 
about </( j), for 1 < i,j < It/ 1. 

Definition. Let ID = U X 1 and consider a function f:X -» # + . Define the 

function f':D ■* 3 + by 

ld| 

f'U) = U{f(t/(i))} n . (d) 

where n,:ID -* IN. Then f is said to be a concatenation- preserving function if, 



■i 
for all i * j, 



D({fU(i))} nj(d) ) nD({fU(j))} (d) ) = * 



Let g be any function g:ID -» 8 and let f be the function defined above. 
Consider the function p-D -> 8 defined by the union 

p(d) ={f'(rf)}„\ d) U{g(rf)} n 2 (d) , 

where n^lD -> iN , n 2 :ID -» IN. Up is, in fact, a representation and if 

D({f'(e/(i))} n i (d) )) nD({g(t/)} n 2 (c/) ) = 0, 
then /3 is said to be a concatenation-preserving representation. 

The condition that the domains of {f(t/(i) ) } n (d v and {f(t/(j))} n (d) not intersect 
guarantees that f is, in fact, a concatenation of encodings of the list elements and 
that the representations of the list elements do not overlap. Notice that the function 
g can be chosen in any way whatsoever, so long as the resulting union, />, is a 
representation. We now reconsider Example 5.3 and see that fj and f 2 are 
concatenation -preserving functions. 
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Example 5.4. The functions fj and f 2 from Example S.3 are 
concatenation-preserving functions, since they fit the form of the above definition. 

(a) Civen the function f:X •* 8* as in Example 5.3, we can define fy\D -* S + by 

i-i 

where n,(c/) = 2lf(i/(j))l. 

J=i 
Since n, + 1 (c/) - n f (t/) = lf(t/(i))|, it is clear that 

D({f(c/(i))} nj(d) ) nD({f(rf(j))} Bj(d) ) =0. 

(b) Recall that we defined f 2 :ID -* S + by 

i 2 (d) = U{f(c/(i))} 3(i . 1) . 
Since max D(f(x)) =2 the domains do not intersect, and it is clear that f, is a 
concatenation-preserving function. I 

Recalling Example 5.3, when ID = X k we know that \d\ = k and f j and f 2 are, 
in fact, representations. When we wish to allow ID * A' k , however, then we may 
wish to consider one of the following three representation schemes. 

d) If \p{d)\ is of fixed size for all d * ID, then there is no need to specify 
\p(d)\. Fixed length representations are discussed in detail in Section 
5.2. 
(ii) An endmarker representation reserves some symbol or set of symbols 

A. 

b (: 8 to indicate the end of the list f(c/). A formal definition is given 

in Section 5.3. 
(iii) We can encode the length \d\ itself and use this as a pointer. A pointer 

representation is defined formally in Section 5.4. 
We illustrate endmarker and pointer representations in Example 5.5a and 5.5b, by 
extending the function t\ from examples 5.3 and 5.4. 
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4* 

Example 5.5. (a) Recall from examples 5.3 and 5.4 the function i-X -> 8 and the 
function f 1 :!D -» # . We can then define the representation p x '-\D -> # by 

p,(d) ={f 1 (c/)} U{g(c/)} n 2 (d)) 
where g:ID -» 5 is defined by 

gU) =11 

Idl 

and where n 2 (c/) = 2lf(c/(j))|. 

Since we already know that f t is a concatenation-preserving function, we need only 

note that 

D((g(rf)]„g nz)({f,(c/)} ) = 

in order to verify that p is, in fact, a concatenation -preserving representation. For 

instance, 

/».(abad) - {f.(abad)} U {g(abad)} , 

= (f(a)- f(b)- f(a)- f(d)} U {11} 9 = 00010001011 
/>,(bdb) =f(b)-f(d)-f(b)-g(bdb) =0101001011 
/»,(c) =011 
/>,U) =11. 

M 
Notice that l/»,(t/)l = 2lf(t/(j))l + lg(e/)l. 

Since g(t/) and f(x) are distinguishable for all x * X, the string g'(c/) = 11 serves 

as an endmarker, allowing us to detect when the end of the list has been reached. 

However, since we also have, e.g., p{c) = 011, not every occurrence of the string 11 

corresponds to the endmarker. It is necessary to somehow decode p{d) as we read 

it. 

(b) Recall the functions f:X -» 8* and f 2 =lD -> # + from Example 5.3. Define the 

concatenation -preserving representation p 2 '-\D -> 8 by 

/> 2 U) ={f 2 (c/)} M+1 U{g(c/)} 0) 

4* 

where g:ID -> 8 is defined by 

gU)=l |d| 0. 
Notice that g(c/) corresponds to the length Ml Thus, after reading g(c/), we shall 
always be able to tell when we are at the end of the list representation f 2 (c/). For 



instance, 

/> 2 (abad) =11110000100010 
/» 2 (bdb) =111001010010 
P 2 ( X) = 

We shall later discuss more "efficient" pointer representations. I 

If in a concatenation-preserving' function the functions n , are all constant 
functions (i.e., the values of n s are not functions of the particular d being 
represented) then we say that the function has fixed position fields. Intuitively, 
this says that if we were to ask. the question 7 p for i < \d\, then we would always 
know where in the representation to begin reading. 

Definition. Let ID = U X 1 and let f be a function f:X -> 8 . Consider a 

concatenation-preserving' function f':ID -» 8 defined by 

Idl 

rid) = U{f(c/(i))} n(d) 

where n,:ID -» IM. If for all d v d z S D and for all j, 1 < j < max i, 

n ( (c/j) = n,(</ 2 ) 
then the function f is said to have fixed position fields. We define an n, field 
to be the set 

U/)(f(x)) + n p 
for 1 < i < It/ 1, where we use the notation 

{s,, s 2 , . . . , s p } + k & {sj+k, s 2 +k, . . . , s p +k}. 

Clearly neither of the extensions of f j in Example 5.5 gives us a function with fixed 
position fields. The function f 2 from Example 5.3 is, however, a fixed position 
field function, since n,(c/) = 3-( i -1 ) for all d i D and thus each n, is a constant 
function. Since each n { field consists of all cells which may be occupied by 
p(d(i)), the n, field for p of Example 5.3 is just { n , , n i + 1, n, + 2}. Notice that it 
is not necessary that an n i field consist of contiguous memory cells, although for 
simplicity most of our examples will be of this form. In fact, it is possible for two 
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iij fields to "cross"; e.g., we might have 

kj, k, + k 2 < U D(f(x)) +n, 

and k. + k 3 * U D(f(x)) + n l + 1 , 

for 1 < k 3 < k 2 . Example 5.6 gives an example of a concatenation-preserving 
representation with fixed position fields, where a field does not consist of 
contiguous cells. 

CO 

Example 5.6. Let X = {a,b,c}, 8 = {0,1}, and ID = \Jx\ Define the function 



f:XU{**} -» # + by 



i = 





f(a) =0 
f(b) =0 1 
f(c) =10 

f(j3f) =11 


Consider the representation p-\D -* 8 defined by 

Idl 

fid) = U{fUO))} n u{i.i} nii , 

1=1 i ld|+l 


where 

J 2 -i — 3 for i even 

rij = < 




2-i -2 for i odd 



Thus, c/(l) occupies cells and 2, c/(2) occupies cells 1 and 3, c/(3) occupies cells 4 

and 6, c/(4) occupies cells 5 and 7, c/(5) occupies cells 8 and 10, etc. For instance, 

p{ X) = 1_1 
/5(abaa) = 000100001J 
/j(bacba) =001010010101. 

So an n ( field is not a set of contiguous cells. In fact, the n,, field is 



U D(f(x)) + n 3 = U Z)(f(x)) +4 = {4,6} 



and the n 4 field is 



U D(f(x)) + n 4 = U D(f(x)) +5 = {S, 7}. 
x*X x<fX 

Notice that the n 3 field and the n 4 field "cross", since 

4, 6 « U D{f(x)) + n, 
xtX 

and S * U D(f(x)) + n 4 . 

x«X 
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5.2 Fixed Size Representations 

In theorems 4.9 and 4.10 we showed that if a representation p'-D -* 8 
achieves Kraft storage and also achieves Kraft access for all y * I\ then ID = X or 
ID - {A} U X n . In this section we show that it is possible, where ID = X n or 
ID = {x} U X, to have Kraft storage and access with a fixed size representation. In 
fact, if the relative sizes of the problem and machine alphabets are chosen 
correctly, and if the domain is of one of the two appropriate forms, then there is 
always a fixed size representation which achieves Kraft storage and access (see 
Theorem 5.S and Corollary S.S.I) . 

Recalling Section S.l, a representation p is said to be of fixed size if it maps 
all strings in ID into strings of the same length. 

Definition. A representation p--\D -> # + is a fixed size representation function 
of size r if and only if 

(VcM \D)(\p(U)\ = r) 

Notice that the definition makes no requirement that D{p(d)) = {0, 1, . . . , lc/l-1}, 
and in general p{d) might occupy any r cells of memory, not necessarily 
contiguous. Of course, we frequently consider a representation p-'D -» 8\ where 
each d ( ID is mapped onto a sequence m = m{l)m{2). . . m{\) = p(d) , for 
m{i) d 8. For any fixed size representation, however, it is known that each p{d) 
occupies exactly r cells, and so it is not necessary to store any additional 
information concerning the length of the representation. Let us look at two 
examples of fixed size representations. 

Example S.7. Let ID = {\} U X U X 2 , X = {a,b}, and 8 = {0,1}. Define the fixed 
size representation p-D -* 8 as follows: 

p( X) - 000 
/>(a) =001 
p(b) =010 
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/»(aa) =011 

/>(ab) =100 

/>(ba) =101 

/j(bb) =110 

Since there is no c/ * D such that /?(c/) = 111, /? does not achieve Kraft storage. 

Also, it is not possible, using representation p, to implement any 7, * T so as to 

achieve Kraft access. (If y. did achieve Kraft access, then the tree for y i would 

have three leaves and therefore two internal nodes. So one answer among a, b, 

would be determined in a single access, but by inspection we can see that this 

cannot happen.) I 

Example S.8 illustrates a procedure for constructing a fixed size representation for 

which, if ID * X n and \X\ = ISl k - 1, we can attain Kraft access (although not 

Kraft storage). Notice that r = k-iri, and we answer y { by first accessing cell 

(i-l)k. 

n 
Example 5.8. Let ID = Ux\ X = {a,b,c}, and 8 = {0,1}. Define the fixed size 

1 = 

concatenation-preserving representation p:\D -* 8 n by 

Itfl n 

p(d) = U{f(«/(0)} 2(M) u U {f(*)} 2(M) 

i=i l=M+i 

where f:,YU{#} ■* 8 is defined by 

f(a) =00 

\(b) =01 

f(c) =10 

f(0) =11 



In particular, for n = 2 we have 



p{\) =1111 

/3(a) -0011 

p(b) =0111 

pic) =1011 

/»(aa) =0000 

p{ ab) = 0001 

/>(ac) =0010 

/>(ba) =0100 

p(bb) =0101 
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/»(bc) =0110 

pica) =1000 

p(cb) =1001 

pice) = 1010 

Notice that 151 = 2 and 2 2 - 1 = 3 = IXI. So r = 2-2 and to answer y { we first access 

cell 2-(i-l). Figure 5.1 illustrates access trees for 7, and y 2 , and it is clear that we 

achieve Kraft access. On the other hand, p does not have Kraft storage because 

2l3-2- 4 =^Ul. 

Intuitively, we would have achieved Kraft storage if we had altered the definition of 

p by letting p{\) = 11__; this would have made ^(ID) a complete code. Instead, 

we chose to specify values for m{2) and m{2) so we could always answer y 2 in two 

accesses. This illustrates a trade-off between Kraft storage and Kraft access. I 





Figure 5.1. Access trees for y j and y 2 of Example 5.8. 

Notice that when we define some fixed size representation p, we have not 
explicitly said anything about the elements in the problem domain ID. If, however, 
we meet Kraft storage, then we know by the following theorem that there are \8\ r 
elements in the domain. 

Theorem 5.1. Let p:\D -* 8 be a fixed size representation of size r. p 
achieves Kraft storage if and only if |ID| = l#l r . 



Proof: Since \p(d)\ = r for all d « ID, then 



I \a ipU), -m-\a- 

tf<D 
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and we have Kraft storage if and only if |DMSr r = l; that is, if and only if 

iidi = isr. i 

Notice that we could, of course, be representing any \8\ T string's in ID. 

We know by Theorem 3.10 that we cannot achieve Kraft access for IXl < \8\. 
Unfortunately, even for I XI > 131, the conditions IIDI = ISl r and ID = U X* do not 
guarantee that there is a fixed size representation that attains Kraft access. 

Example 5.9. Let \8\ = 3, |X| = 4, and ID = {x} U X 2 U X 3 . Then for r = 4, we 
have |#| r = 3 4 = 4 3 + 4 2 + 1 = IIDI, and a fixed size representation p-\D -* 8 A is 
storage optimal. On the other hand, by theorems 4.9 and 4.10 we know that there 
is no representation, fixed size or otherwise, that achieves both Kraft storage and 
Kraft access for the table lookup question set T = (Yj, y 2 , 7 3 }. I 

In the last chapter, we have already shown that in order to possibly achieve 
Kraft storage and Kraft access, it must be the case that ID = X n or ID = {x} U X n . 
If we wish a fixed size representation to have Kraft storage and access, then either 
ID = X n or else we have the less interesting situation where ID = {x} U X . 

Lemma 5.1. Let ID = {x} U X n and consider a fixed size representation 
/"ID -» 8 , of size r, which achieves Kraft storage. Assume also that each 
7, * r achieves Kraft access. Then |#| = 2 and ID = 1; i.e., D = {A; U X 1 
and IXl - 1. 

Proof: By Theorem 4.9 and Corollary 4.9.1, since ID * X n then the only way we 

can achieve both Kraft storage and access is to have \8\ = 2 and for there to be 

sonic 7 p 7, * r such that 7, and 7, have overlapping sets. As a consequence of 
Theorem 4.8, we know that 7 p 7, access some cell in common if and only if 

X k st ID for all i < k < j. Since ID = {\} U X n , then any pair of table lookup 
questions has overlapping access sets. Now lemmas 4.2 and 4.4 allow us to conclude 
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that each 7, ( T has the same root node label and each has a leaf labelled at 
depth 1. But this says that \p{\)\ = 1, and so if p is a fixed size representation 
then it is a fixed size representation of length 1. Thus in = 1. if y x has a leaf 
labelled a at depth greater than 1, then \p{\)\ * \p{a)\ and p could not be a fixed 
size representation. Thus y x has its only two leaves at depth one and so we have 
the trivial case ID = {x} U X x and \X\ = 1. I 

Of course, the above lemma simply says that if a fixed size representation achieves 
Kraft storage and access, then D = {x} U X. The following example shows that it 
is, in fact, possible to have ID = {\} U X for a fixed size representation which does 
have Kratt storage and access. 

Example 5.10. Let # = {0,1}, X = {a,b,c;, and ID = {Ay U X. Define the 
representation p'\D -> 8 by 

P(\) =01 

p('a) =0_Q 
p(b) =10_ 
p(c) =11_ 

Clearlv p is a storage optimal fixed size representation of size 2, and from Figure 

5.2 we see that it is possible to implement y x so that it has Kraft access. I 




Figure 5.2. Access tree for y x of Example 5.10. 
Now from Lemma 5.1 and theorems 4.9 and 4.10 we obtain the following result. 
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Theorem 5.2. Consider a fixed size representation p-\D -» 8 . Assume p 
achieves Kraft storage and each y x ^Y achieves Kraft access. Then iD = X n 
or D = {A} U X. 

In fact, achieving- Kraft storage and access with a fixed size representation tells 
us something about the relative sizes of the problem and machine alphabets. 

Lemma 5.2. Consider a fixed size representation p-D -* 8 which achieves 
Kraft storage, and assume that all y i * F achieve Kraft access. Then, for all 
7 j ( r, the access tree for y i has uniform depth. 

Proof: By Theorem 5.2, there are two cases to consider: 

d) ID = {X} U X. In this case we know by Lemma 5.1 that in = 1 and \X\ = 1, so 

7j clearly has uniform depth and p is a fixed size representation. 

(li) ID = X . Then by Theorem 4.10 there are no overlapping access sets. Assume 

there is some 7 n whose access tree does not have uniform depth; in particular, let 

leaves labelled Xj,x 2 ( X be at different depths. Then there exist d v d 2 <: ID such 

that c/jdi) = x j and 

I d j(i) for i / n 



rf 2 (i) = 



x 2 for i = n 



By Theorem 4.12, 

\p{d x )\ = 1 #[?,( />(</,))] + Uly n { P {d x m 
i*n 
and 

\p(d 2 )\ = 2 Uly^pid^)-} * Uly n (p(d 2 ))l. 
i*n 
But 

#C7 n (/'(c/ 1 ))] = #[ Xl ] * #Cx 2 ] = ULy n (p(d z ))l 
I bus \p{d x )\ * \p{d 2 )\, implying p is not a fixed size representation, a 
contradiction. So each 7, has a tree of uniform depth. I 



mi - 



This lemma allows us to prove that \X\ = I6'l k or \X\ = \8\ k - 1 if we are to attain 
Kraft storage and access with a fixed size representation. 

Theorem 5.3. Consider a fixed size representation p:\D -> 8 which achieves 
Kraft storage, and assume all y i * T achieve Kraft access. If ID = X n then 

la'l k = lA'l for some k i IN , and if ID = {\)UX then l#! k = IXl + 1. 

Proof: Let ID = X n . By Lemma 5.2, we know that the access tree for y { has 
uniform depth, say k, and so 101 k = |fl(Y,( ID) )l = \X\. Similarly, for 
ID = {x} U X, \R(y i (\D))\ = IXl + l = l#l k . I 

The following example illustrates, however, that attaining Kraft storage and 
access, even where D = X n and \X\ = I3l k , does not necessarily mean our 
representation has fixed size. 

Example 5.11. Let 8 = {0,1}, X = {a,b,c,d}, and ID = X 2 . Define the 

representation p-'D -> 5 + as illustrated in Figure 5.3. More specifically, for 

Xj, x 2 <■ X, we can let p(x x - x 2 ) = f 1 ( x j ) ■ f 2 (x 2 ), where the representation tree 

for f, has the same form as the access tree y ( . For instance, 

Mac) = 0__10 

^(ad) =0__11 

pibb) =10_01 

pidc) =11110. 

From the trees it is clear that y x and y z achieve Kraft access. Also, since 

|/?(xyx 2 )| = |f 1 (x 1 )l + lf 2 (x 2 )| = |f,( Xl )l + 2 

then the reader can verify that 

1 2" i/3U)i =4.2- 3 + 4-2- 4 + 8-2- 5 = l 
dtX 2 
and so p achieves Kraft storage. I 

On the other hand, the following example shows that we could have defined p in 
the above example to be a fixed size representation and still have attained Kraft 
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Figure 5.3. Access trees for ? t and y 2 of Example 5.11. 

storage and access. In fact, there would always be such a fixed size representation. 

Example 5.12. Let 8, X, and ID be the same as in Example 5.11 and define the 

representation p-\D -* # + as illustrated in Figure 5.10. For instance, 

/j(ac) =0010 

/»(ad) =0011 

p{bb) =0101 

p{dc) =1110 





Figure 5.4. Access trees for y x and y 2 of Example 5.12. 



Clearly we achieve both Kraft access and Kraft storage. 



I 



To help motivate some further discussions, we first prove the following simple 
lemma. 
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Lemma 5.3. Let ID = A'". Then the following' statements are equivalent. 

(1) There is some representation p-X -* 8 which attains Kraft storage. 

(2) There is some implementation for which each y ( ( Y achieves Kraft 
access. 

(3) There is some k f IN! such that lAl = k-(l#] - 1) + 1. 

Proof: For ID = A' n , Riy^lD)) = X. There is some representation p which 
attains Kraft storage for ,x £ X if and only if there is a ISl-ary tree with lA'l leaves 
if and only if lA'l = k-(l#l - 1) + 1. Also, y ( achieves Kraft access if and only if its 
iSI-ary tree has lA'l leaves if and only if lAl = k-(|/J| - 1) + 1. I 



It is not the case, however, that Kraft storage for a representation p-X n -> 8 

+ 
implies Kraft storage for some representation p-X -» 8 . 

Example 5.13. Let \8\ = 5 and lAl = 7. To get Kraft storage for A', we would need 
i-(l/j'l - 1) + 1 = 4i + 1 = 7, which is not possible. But for X^, i = 12 gives us 
i-(l/JI - 1) + 1 =4i +1 = 49. J 



We are now ready to prove the main results of this section. The proof of the 
following lemma is essentially the same as the proof of Lemma 4.3. 

Lemma 5.4. Let A' ={x,, x 2 , ..., x k } and ID = A' n . Consider a 
representation f:A -* 8 . If f achieves Kraft storage, then a 

concatenation-preserving representation p-\D -> 8 defined by 

n 

p(d) = U{f(c/(i))} n(d) , 
i=l ' 

where n,:ID -» IM , also achieves Kraft storage. 

Proof: By induction on n we prove that 



■ Wrf,l -L (5.1, 



JlX n 



Basis: For n = 1, \p{tl)\ = lf(c/)l and so 
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Z 161 = Z 151 = Z l6l = 1. 

t/a r dtx xix 

Induction step: Assume that (5.1) holds for n. Then 

2 m Md)l . 2 «>■""" 



i* i 



c/*A' n+1 c/*X n+1 



r. -(2if(t/(i))i + if(x.)i) 

Z 161 iS| 

(J<:X n -X, 



1 

+ Z 161 "' 2 

v , . ,-(2lf(c/(i))| + lf(x K )|) 



ly -lf(x 2 )i „ -iV(c/(i))l 

+ 161 2 Z 161 •" 



dtx n 

I 

d$X n 



, -lf(x k )i -, , -2lfU(i))i 

+ • ■ • + 161 k Z 161 '-' 



By our inductive hypothesis this then gives us 

2 i8- w '" , .2w- |rt "' ,l .2«- Wx,l .i. 
c/a ,n+1 '-1 x<-A' 

Theorem 5.4. Let ID = X n . If there exists some k « IN for which 
U'l = k-(l6l - 1) +1, then there is an implementation (ft, p) solving (T, ID) 
such that /?:ID -» 6 + achieves Kraft storage and each y i * V achieves Kraft 
access. 

Proof: Since \X\ = k-(l6l - 1) + 1, we know by Lemma 3.1 that there is a I6l-ary 
tree T with \X\ leaves and node labels chosen from the set {0,1, . . . r}, for r < k-1. 
We can use this tree T to define the storage optimal representation f:X -* 6*. Now 
define the concatenation -preserving representation p--D -» 6 + by 

n 

dtf) = U{fU(i))} (rM)(M) 
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By Lemma 5.4, since f achieves Kraft storage so does p. Also, if we implement 
7. d r by the same tree T except replacing node label j by label j + (r+1) •( i-T) , 
then each 7, ( T achieves Kraft access. I 

From Lemma 5.3, Theorem 5.4 holds if instead of the condition \X\ = i-{\8\ - 1) + 1 

4. 
we have the condition that there be some representation p-X -» 8 which attains 

Kraft storage or that there be some implementation for y i which achieves Kraft 

access. Trivially, the above theorem also holds for ID = {x} U X, when 

\x\ + 1= Kisi - 1) + 1. 

Corollary 5.4.1. Let !D = (a)UX. If there exists some k * IM + for which 

|X| + 1 = k-{\8\ - 1) + 1, then there is an implementation (ft, p) solving 

(T, ID) such that p:D -* 5 + achieves Kraft storage and y x t V achieves Kraft 
access. 

We present an example to illustrate how p and f in the proof of Theorem 5.4 might 
be chosen. 

Example 5.14. Let 8 = {0,1}, X = {a,b,c,d,c}, and ID = X n . Then 

IX I = i-(l#l - 1) + 1 is satisfied by 1 = 4, and there is a binary tree with five leaves 

and four internal nodes whose labels are in {0,1,2,3}. In fact, there are many such 

trees, and we (arbitrarily) pick T to be the tree shown in Figure 5.5a. Using T, we 

define the representation i'-X -> 8 by 

f(a) =00__ 
f(b) =01__ 
f(c) = 1_00 
f(cl) =1_01 
f(e) =11 



By inspection, f attains Kraft storage. We define the concatenation -preserving 

p(d) = f(c/(l))-f(c/(2)). 



4* 

representation p-\D -> 8 by 



(a) 




c d 
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(b) 




c d 



Figure S.5. Trees for T and y z of Example 5.14. 



So T is also the access tree for 7 p and the access tree for y z is the same as T but 
has each node label j replaced by the label 2i + j. The tree for y z is illustrated in 
Figure 5.5b. Then we have, for instance, p{bc) = 01__1_00. The representation p 
achieves Kraft storage because 

2 2" l/5U)l = 9.2- 4 + 12-2- 6 + 4-2- 6 = l. 



d*X 



,z 



By inspection of the trees for y x and y 2 , we also attain Kraft access. 



I 



Notice, however, that the representation p in Theorem 5.4 has many "gaps" in 
it. Even if we had constructed the tree T so that each node at depth j had label j, 
we would still have had gaps, unless T were of uniform depth. If we require that p 
be located in consecutive cells, then we cannot obtain Kraft access unless for all 
J v d 2 <:X n , IM«/,)l = l/>(</ 2 )l; >.e., ULy i {p[d))l=UZy i (p{d))l, for all 
YpYj * r. We now show that if in Theorem 5.4 it had also been the case that 
\X\ = ISI \ then there would have been an implementation achieving Kraft storage 
and access with a fixed size representation and without any "gaps". 

Theorem 5.5. Let D = X n and \X\ = lSl k for n,k < IN + . Then there is an 
implementation (d, p) solving (T, ID) such that p-\D -> 3 nk is a fixed size 
representation achieving Kraft storage, and each y, * T achieves Kraft access. 
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Proof: Since we are given that lA'l = |#l k , the equation \X\ = v{\8\ - 1) + 1 is 
k-l 

satisfied for i = 2 ISK Theorem 5.4 immediately tells us that there is some 

J=o 
representation with Kraft storage and. access, but we want to show that there is, in 

fact, such a fixed size representation. As in the proof of Theorem 5.4, we define 

the concatenation -preserving representation p-\D -» 8 n by 

p{d) = f(rf(l))-f(c/(2))-...-f(e/(n)) 

where (-X -* 8 k corresponds to a tree T of uniform depth k where each internal 

node at depth j has label j. Certainly f and therefore p both achieve Kraft storage, 

as verified by 

1 \8\' lp{d)l = 2 \sr k = \x\ n - isr nk =ut- \x\- n = i. 

Also, we implement 7 <- T by the same tree T, with labels j replaced by mk + j. 
Each 7 m 6 r achieves Kraft access, since tii.y m {p{d))~\ = k and 

I \8\~ tt{r) = I lflT k = IXI ■ IST k = 1. I 

1-6 X rtX 

We can give an example, similar to Example 5.12, which illustrates this theorem. 

Example 5.15. Let 8 = {0,1}, X = {a,b,c,d}, and D = A' 2 . Notice that \X\ = ISI 2 , 

and Figure 5.6a shows a tree T of uniform depth two corresponding to the 

representation f=A' -* 8 . Then we define the representation p-\D -» 8 by 

p(J) = f(</(l))- f(c/(2)). For instance, 

/j(ac) =0010 

/j(ad) =0011 

p{bb) =0101 

/»(dc) =1110 

The tree T of Figure 5.6a is the access tree for y { , and the access tree for y 2 is 

shown in Figure 5.6b. I 

Analogous to Theorem 5.4, we have the following corollary. 



(a) 
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(b) 




Figure S.6. Trees for T and y z of Example S.15. 



Corollary 5.5.1. Let ID = {x} U X and \X\ + 1 = lSl k for some k « INI + . Then 
there is an implementation ((I, p) solving (T, ID) such that p'-\D -» # k is a 
fixed size representation achieving Kraft storage, and each y x ( T achieves 
Kraft access. 

What we have proved in this section is a weak equivalence between the 
requirements that ID = X n (or ID = {a} U X) and that there be some fixed size 
implementation in which we achieve Kraft storage and access. More precisely, 

4- 

Theorem 5.2 told us that if there is a fixed size representation p-\D -* 8 which 
achieves Kraft storage and for which each y i ( T achieves Kraft access, then 
ID = A' n or ID = {x} U X. Conversely, Theorem 5.5 and Corollary 5.5.1 essentially 
tell us that if ID = X n or ID = {a} U X, then there is some fixed size representation 
which achieves Kraft storage and access. The condition IXI = ISi k (or 
\X\ + 1 = I5'| k ) was put in to avoid "rounding errors". If we do not have lA'l = I61 k 
for ID = X n , then either we do not have Kraft storage or else our tree must have 
leaves at (at least) two depths, j and j + 1. This would cause 
nj < \p{d)\ < n(j + 1) and so p would not be of exactly fixed size. Or else we 
could let p--\D -* S n J be fixed size and then we would not quite attain Kraft 
storage. Thus, theorems 5.2 and 5.5 allow us to prove the following result. 
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5.3 Endmarker Representations 

Recall from Section 5.1 that an endmarkcr representation has some fixed 
symbol or sequence of symbols in 8 which are always at the "end" of the list. 
Example S.Sa is an example of an endmarkcr representation. The representation p 
in Example 2.7 is also an endmarker representation, with endmarkcr 0. Wc now 
give a formal definition. 

Definition. Let f be a total function f:ID -> # + , and let « # + (0 * 0). 
For each d t ID, let n(d) « IN such that nU) > max D{i{d) ). Then a 
representation p^D -> 8 which is defined by 

p{d) = f(e/) u{o} n(d) 
is an endmarker representation. The relation is known as the endmarker, 
and the function f is the list component of /J. 

To illustrate what this definition says, we present the following example. 

Example 5.16. Let X = {a,b,c}, 8 = {0,1}, and ID = U xK Define the function 

f:XU{#} -» # + by 

f(a) =00 

f(b) =10_ 

f(c) =11_ 

fCjzf) ■= 0_1 

If we then define f':ID -» # + by 

Idl 



r(d) = U{fU(i))} 3(M) , 

1=1 

then the representation 

pit!) =f'(c/) U{f(jzf)} 3 | d | 



oo 



is a concatenation-preserving endmarker representation. For ID = UX\ it is easy 

1 = 

to verify that p achieves Kraft storage, since 

\p{d)\ = 2-k/| + 2. 
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Thus, 

den *J lKX [ 

CO 

= 2ixi l -2- <2i + 2) 

i = 

42(|)' 
= 1. 

Note that no finite |J| will give us Kraft storage. 

Now consider answering a table lookup question y { * I\ Tor y x we need only 
access m(0) and w(l), or else m{0) and w(2). On the other hand, to answer the 
question y 2 , accessing just m{Z) and m{4) (or else w(3) and w(S)) may not give 
the correct answer. In particular, unless we have already determined that the 
answer is 0, then we must verify that \d\ > 1. This requres accessing m[0) and 
possibly m{2). Possible access trees Tj for each y { can be constructed as indicated 
in Figure 5.7, where we write {Tj} k to denote the tree Tj with each node label j 
replaced by the label j + k. These trees correspond to reading the necessary 
memory cells in a left to right order. It would also be possible to read the cells 
essentially from right to left. For either method, once f( 0) is encountered for y v 
then it is known that \d\ < l. t 




?!♦! 




(Tj), * 






P3 



P3 



Figure 5.7. Trees for y i of Example 5.16. 
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The endmarker representations we have thus far seen are all 
concntenation -preserving" representations, but there is no such requirement in the 
definition. In fact, there is not even any requirement that the endmarker be 
necessary; i.e., for an endmarker representation p{d) = f(c/) U {0} n / d) it may be 
the case that f(ID) itself is a representation and thus the endmarker is 
superfluous. Also, there is no restriction that the endmarker not appear in f(d). 
Even if the pattern 0*8 does not appear in f{d), there may be "holes" in i(d), 
which allow the possibility of another user writing 0. Thus, it may not be the case 
that the first occurrence of serves as the endmarker. 

Example 5.17. Let 8 = {0,1} and, for ID = {d v c/ 2 , d 3 , d A , c/ 5 }, define the 
function f:ID -> # + by 

f(c/,) =0_0 
fU 2 ) =0_1 
f(c/ 3 ) = 10_ 

f(c/ 4 ) =_10 

i(d 5 ) =11_ 

If we let = {(0,1), (1,0)}, we can then define the endmarker representation 
p:\D ■+ 8 f by 

p{d x ) =0_0_10 

p(d 2 ) =0_1_10 

p{d 3 ) =1010 

p{d A ) = _1010 

p(d ) = 1110 

The endmarker here is not superfluous because it does enable us to distinguish 

between p{d } ) and p{d 4 ) and between p{d 4 ) and p{d 5 ). On the other hand, 

even if we were to eliminate d 4 from the domain, p would still be an endmarker 

representation. Notice also that p(d x ) = 0_0_10, and thus if another user sets 

w(l) = 1 then the actual endmarker is not the first occurrence of 0. In fact, f{d 3 ) 

itself contains the set 0. I 
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We usually have in mind a more restricted notion of an endmarker 
representation, where we require to be distinguishable and reserve solely to 
indicate the end of the list. (Of course, if the representation has holes in it, then it 
is still possible for other users to write 0.) Thus, if we read a list representation 
from left to right and access no cells not in the representation, then encountering 
immediately tells us when we've reached the end. Most of our examples will be of 
tins form. 

Notice that the function f in Example 5.16 has fixed position fields. 
However, the endmarker in the representation p has a displacement function 
n(r/) = 3-lt/l. So n is not a constant function, and the endmarker is not always in 
the same memory position. In fact, if the endmarker were always at the same 
location, then there would be no point in having an endmarker at all; there is no 
such concatenation -preserving endmarker representation. We make the following 
definition. 

Definition. Let /?:ID -> 8 be a concatenation-preserving endmarker 
representation, with endmarker 0, and formed from a 
concatenation-preserving function f with fixed position fields n r If p is of 

the form 

Id! 

/>(«/) =U{fU(i))} n u{o} 

i=i i M+i 

then p is said to be a fixed position field endmarker representation. 

Thus, the representation p in Example 5.16 is a fixed position field endmarker 
representation. 

In Example 5.16 we saw an endmarker representation that achieves Kraft 

CO 

storage when ID = Ux . We can, in fact, show that achieving' Kraft storage implies 

i=0 
that max|/?(c/)| is unbounded. 
c/€D 
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Theorem 5.7. If an endmarker representation p--\D -* 8 achieves Kraft 
storage, then -i(3n « IN ) ( Yrf € D) (!/>(</) I < n). 

Piw/: Assume that (3n € INI )( Vc/ € D)(l/»(t/)l < n). 

Then it is possible to choose d k t ID such that 

max DipidJ) = max D{p{d)). 
k c/<ID 
Thus, no p(tl) occupies a larger memory cell location than p{d k ). By the 

definition of an endmarker representation, there is some function f such that 

P (d k ) -f(</ k ) u{o} n(V . 

Now let r = min D{0), and choose b * 8 such that b is not a prefix of 0. (Since 
\8\ > 2, there must always be such a b 0> ) Consider the string 

b = f(c/ k ) U{(n(e/ k ) +r, b )}*# + 
For all c/j $ ID, b and p{d { ) are distinguishable. In other words, there is no d i t ID 
such that b Q pjid^. So by Theorem 3.3 p does not achieve Kraft storage. Thus, 
our original assumption must have been wrong, and we conclude that 

-.(3n«IN)(V</« \D)(\p(d)\ <n). 1 

It immediately follows that if an endmarker representation achieves Kraft storage, 
then the domain ID must be infinite and also that the index set J must be infinite. 

Corollary 5.7.1. If an endmarker representation p-B -> 8 achieves Kraft 
storage, then -.(3n « INI MIDI < n). 

Corollary 5.7.2. If an endmarker representation p-B -> 8 achieves Kraft 
storage, then ->(3n e IM)(max i < n). 

Thus, when we are discussing endmarker representations, we frequently consider 

1D = Ux'. 

1 = 



10$ 



Notice that since achieving Kraft storage tells us that the domain must be 
infinite, we immediately know that no eridmarker representation can achieve both 
Kraft storage and Kraft access. 

Theorem 5.8. There is no endmarker representation that achieves Kraft 
storage and also achieves Kraft access for all y^ I\ 

Proof: From theorems 4.9 and 4.10, we know that if a representation p achieves 
Kraft storage and Kraft access for all 7,€ V, then ID = X n or ID = {x} U A' n . But 
by Corollary 5.7.1 we know that liDI cannot be finite for an endmarker 
representation that achieves Kraft storage. Thus, there is no endmarker 
representation that achieves both Kraft storage and Kraft access. I 

Recall again the representation f in Example S.16, which achieved Kraft 
storage. We can show that this result generalizes. In particular, given any 
representation f:A'U{#} -> o + which achieves Kraft storage, a 

concatenation-preserving endmarker representation p formed from f also achieves 
Kraft storage. Before we prove this, however, we introduce some terminology and 
prove a lemma. We begin with the following definition. 

Definition. Consider a full ISI-ary tree T" with l#| k nodes at depth k, for 
all k (- IN. Assume that some of the (internal) nodes are labelled but that 
T' has the property that if a node is labelled then none of the descendants 
of that node is labelled. We use the term 0-nodc to refer to a node labelled 
or the descendant of a node labelled 0. We then let £ denote the fraction of 
the nodes in 'X ' at depth k that are -nodes. 

Since £ is a fraction of nodes that are 0-nodcs, it is clear that < £(k) < 1. Also 
£(k + 1) >£(k), since a #-node at some level leads to the same fraction of 
0-node descendants at the next level. The following example should clarify what is 



iOb 



meant by a 0-node and by £(k). 

Example 5.18. Consider the tree T" in Figure 5.8. For simplicity we have deleted 
the node labels indicating memory cell locations. We have, however, retained the 
external label on certain nodes and marked each 0-node with an "x". Notice 
that all descendants of nodes labelled are themselves 0-nodes. There are 1 
0-node at depth 2, 3 0-nodes at depth 3, 8 0-nodes at depth 4, 19 0-nodes at 
depth 5, etc. Thus 

f(0) =$(1) =0 

«2> 4 

■«3) =K2) + j=| 
We shall have occasion to refer back to this tree T' in a later example. I 




Figure 5.8. Tree T' from Example 5.18. 



In order to motivate some of the terminology used in the next lemma, let us 
consider another example. 
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Example 5.19. Let X = {a,b}, 8 = {0,1}, and consider the representation 
(:XU{0} -» 8* defined by 

f(a) =1 
f(b) =00 
f(0) =01 

Then f achieves Kraft storage and corresponds to the tree T f shown in Figure S.9a. 

oo 

Now, for ID = Ux', define a concatenation -preserving endmarker representation 

1 = 

p-\D -» #* bv 

Ml 

P (d) = U{f(c/(i))} n . (d) u{(o,o), d,i)} n(d) 

i-l Idl 

where nXd) = 2lf(c/(j))l and n(c/) = 2lf(c/(j))l. Then we can construct a tree 

J=i J-i 

T for representation p as in Figure 5.9b. ! 

We can now prove the following lemma. 

Lemma 5.5. Consider a prefix representation f:XU{jzf} -> S* which achieves 

oo 

Kraft storage. Let ID = Ux 1 and consider a concatenation-preserving 

i = 

4. 

endmarker representation p: ] D -> S defined by 

Ml 

P U) =U{f(c/(i))} ni(c/) u{f(0)} n(d)J 

where n,:D -* IM , n:ID -» IN. Let T be a iSl-ary tree^corresponding to p, and 
let T' be an extension of T which keeps the 0-node labels of T but extends 
the tree so that T' has \8\ k nodes at depth k, for all k * IN , and the labels 
now label internal nodes. Then 

limH(k) =1. 

k"»0O 

Proof: Since the prefix representation f achieves Kraft storage, there is a 

corresponding full iSl-ary tree T f , as shown in Figure 5.9a. Assume that 

|f(0)| = r and that max |f(x)l = p. Then T, has (maximum) depth p and 

x*XU{0} 
the depth of its j^-node is r. The tree T corresponding to p is formed from T f by 
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(a) 



(b) 





Figure 5.9. Trees T f and T from Example 5.19. 



placing a copy of T f at each leaf not labelled and doing this indefinitely. (The 

memory cells to be accessed need to be altered according to the values of n ( (c/) and 

n(c/). Since we know, however, that no path will contain the same memory 

location twice, we choose to ignore these access labels and are concerned only with 

the external labels at a 0-node indicating the d such that p{d) leads to this node.) 

T' is the extension of T where we keep the 0-node leaf labels but extend from 

each of these leaves a full Igl-ary tree. Thus, for all k * IN , T' has l#l k nodes at 

depth k. 

We are now ready to determine lim£(k). It is clear that £(i + 1) > £(i), 

k-*«> 

because if there are j 0-nodes at depth i, then there are \8V] descendants of these 
0-nodes at depth i + 1. Thus, the fraction of these 0-nodes cannot decrease. 
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Also, there may be more 0-nodes at depth i + 1, corresponding to copies of 'I f 
with leaves labelled at depth i + 1. Each node at depth i which is not a 0-node 



w 



ill have a descendant within depth p which is a 0-node. Thus, at least \& 



;tp 



descendants of non 0-nodcs at depth i will themselves be 0-nodes at depth i + p. 
Since the fraction of non 0-nodes at depth i is 1 - £0), 

fti + p) > £(i) + isr p d-*0)) 



1 + ML^L.«i) 



\8\ p \8? 

If we look at the values of £(k) at depths 0, p, 2-p, etc., we find that 

k( P) - I6'l p j.«T l/il p 

IS1 P 
Of course we know that £(k-p) < 1, and so we conclude that 

limfc(k) =1. I 

k->co 

To illustrate the method used in proving Lemma 5.5, we refer back to Example 5.19. 

Example 5.20. Recall the representation p from Example 5.19. The extension of T 
to a tree T' with |#| k nodes at depth k is the tree T' of Example 5.18, shown in 
Figure 5.8. The tree T f , from which T and T' were constructed, has maximum 
depth 2, and the depth of its ^-node is 2. We want to verify that 

£0 + 2) > $0) +|-U-£0)). 

The fraction of non 0-nodes at depth i is 1 - £0). Every non 0-node at depth i 
serves either as a root of another copy of T f (see node A in Figure 5.10a) or else is 
an internal node of some T f copy (see node B of Figure 5.10b). In the former case, 
we get a new 0-node at depth i + 2. In the latter case, we get a new 0-node at 
depth l + 1, which gives us two additional 0-nodes at depth i + 2. 1 

Lemma 5.5 allows us to prove the following result. 



-no 



(a) 



(b) 




A (J) root of T, 



root of T 




root of T f 



Figure 5.10. Origination of new #-nodes at depth i + 2. 

OO 

Theorem 5.9. Let ID = UX 1 , and consider a representation f:XU{jzf; -> #* 

1*0 

which achieves Kraft storage. Assume that the set f[X U {0}) forms a prefix 
code, and let p-\D -» 8 be a concatenation -preserving" endmarker 
representation defined by 



Idl 



P U) -U{fU(i))} n-(d) u{f(*)} n 



1=1 



i(d)' 



where n,:ID -* IN , n:ID -» IN. Then p achieves Kraft storage. 



Proof: Let vKi) be the distribution function 

Wi) e !{/»(«/) ll/>U)l=i}l. 
So ^(i) corresponds to the number of jzf-nodes at depth i that have no jzf-node 
ancestors. Then 



I \8\ 
e/«ID 



■\p{d)\ 



= SiKO-isr 



i=0 



= lim ZMiHSI" 1 
k-»«> i=o 

k 

= lim isr k 2^(i)-isi k "' 

k->«o i=o 

A 0-node at depth i is an ancestor of |#l k_1 descendants at depth k, and so there 
k 

are 2^(i)- lSl k_i jzf-nodes at depth k. Since at depth k there are a total of l£l k 
1=0 

nodes, the fraction of nodes at depth k which are 0-nodes is 

k 

isr k 2WiMsi k ' 1 = SU). 
1=0 
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Applyine. Lemma 5.5 gives the desired result: 

Slfl'^'-llnttD-L ■ 

The above theorem still holds if we do not require that f(X U {0}) be a prefix 
code. 

Theorem 5.10. Consider a representation f:A'L){^} -> 8 which achieves 

CO 

Kraft storage. For ID = Ux\ let p-\D -» 8 be a concatenation-preserving 

1=0 
endmarker representation, where 

\d\ 



p{d) = LJ{f(c/(.))} n(d) U{f(0)} n 



1=1 



for n^lD -> IM , n:D -» IM . Then p achieves Kraft storage. 

Proof: Consider any representation f 1 :A'U{0} -» 8 , and recall from Chapter 3 the 
statement of the Kraft inequality. If fj achieves Kraft storage, then 

2 «- tf ' ( ' )l - 1 

and the Kraft inequality is satisfied (with equality). Thus, there is some function 
f 2 -.XU{0} -* 8 f such that f z (X U {0}) is a prefix code and lf 2 (x)l = If a ( x) I for 
all x $ X U {0}. By Theorem 5.9 we know that for any concatenation -preserving 
representation p x formed from f 2 , 

v , -\pAd)\ , v , ,-(2lf 2 (c/(i))l + lf(^)l) 

2 \8\ x =1=2 \8\ 2 

d(:\D dZD 

\d\ \d\ 

Since \ P {d)\ = I\f^d(i))\ + |f(tf)l = 2> 2 U0))I + lf(tf)l, 

1=1 i=l 

we can conclude that 

and so /? achieves Kraft storage. I 



We can verify directly that the representation p from examples 5.19 and 5.20 
achieves Kraft storage. 
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Example 5.21. What we want to show is that 

Zi- W " l .I#(i>.2".l. 

c/<-D i=o 

Referring' to the tree T of Figure 5.9b, we see that 

MO) = MD =0 
M2) =1 
M3) =1 
\K4) =2 
MS) =3 

In fact, whenever a copy of T r terminates at depth i, then there is a leaf from T f at 

depth i-1 which serves as the root of another copy of T f , one which has a #-leaf 

at depth i + 1. Similarly, if a copy of T r terminates at depth i - 1, then there is a 

leaf of T f also at depth i - 1 which serves as the root of a copy of T f> leading' to a 

0-leaf at depth i + 1. Thus, we can define the distribution function \p by 

Ml) =o 
M2) =1 

Mi + l) =Mi) +Mi-D 

Solving this Fibonacci expression, we find that 

.,.* 5 - VS ,1 + VSJ S + V5 ,1 - -JSJ 

for i > 1. Thus, we can directly show that p achieves Kraft storage. 

I MO- 2"' = Iti^i • (i^V + i^£ . (i^) 1 ] . 2-> 

_ 5 - VS v,l + V5J S + VS £,1-V5,i 

_ S - V5 1 * VS 5 + VS 1 - V5 

""nro - "r=~vs ~To~"rrvs" 

= i. i 



As an aside for interested number theorists, notice that the sum in Example S.21 
holds for MO any extended Fibonacci sequence. 

Corollary 5.10.1. Let fib n (i) £ fib n (i-l) + fib n (i-2) + . . . +fib n (i-n). 

CO 

Then 2fib n (i) • 2 _1 = 1. 

i = 
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Proof: Consider a binary tree T f of the form shown in Figure 5.9a, which has 
internal node labels 0, 1, ... , n-1 (for < i < n, there is one node at depth l, 
and that node has label i) and has one leaf at each of the depths 1, 2, 3, ... , n-1 
and two leaves at depth n. Consider the extension T of T f , as in Figure 5.9b. If a 
copy of T f has a 0-node at depth 1 - k, for 1 < k< n, then that copy of T f has its 
root at depth i - n - k and thus has a node at depth i - n which is not a ^f-node. 
This node, not itself a 0-node, must serve as the root of yet another copy of T f 
and this new copy of T f has a #-leaf at depth i. Thus 

MO =M>-1) + \K»"2) +... + Mi-n). 
But by Theorem 5.10 we know that the extension T of T f corresponds to a 
representation p which achieves Kraft storage. Thus 

CO CO 

2^(i)- 2" 1 = 1 = 2fib n (i)-2" J . i 

1 = 1 = 
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5.4 Pointer Representations 

Recall from Section S.l that a pointer representation has some function 
f'3 -* 8 which serves as a pointer and indicates the length lc/|. Example S.Sb 
gave an example of a pointer representation, and we now give a formal definition. 

Definition. Let ID = Ux', let f be a total function f:D'-» # + , and let 

P--J -> 8 be a representation. Then a representation p:\D -* 8 which is 
defined by 

/'^)={rt^}n 1 (d)UU(WI)}„ a ( d ) 
is a pointer representation if 

D({f(c/)} ni(d) )nD({c(lc/l)} n2(d) )^, 

where rij, n 2 are functions, n^lD -+ IM , n 2 :ID -» IM. The function f is the list 
component of p and A is the pointer component of /?. We refer to ft{ \d\) as the 
pointer of /?(c/). 

Note that the functions iij, n 2 in the above definition are not the same functions as 
the n t in the definition of a concatenation -preserving function. Before discussing 
the pointer representation in more detail, let us present the following example in 
order to illustrate the definition. 

Example 5.22. Let X = {a,b,c,d}, 8 = {0,1}, and D = U X\ Define the function 

f:X -» # + by 

f(a) = 0_0 

f(b) =10_ 

f(c) =11_ 

f(d) = 0_1 

and then define the concatenation-preserving function f':D -* 8 by 

Idl 

r{d) = U{f(c/d))} 



The pointer P.-J -* 8 is defined by 



3(1-1)- 

1=1 
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A(i) = 1*0. 

Then the representation p-D -» 8 where 

is a (jointer representation. For instance, 

/>(abad) - 111100_010_0_00_1 
p{bdb) = 111010_0_110_ 
/>U) =0. 

Notice that 

\p\d)\ = |t/| + l + |f'U)l = 3-lt/l + l. 

For J = iM , then D = UA' 1 and it is easy to verify that p achieves Kraft storage: 

1 = 

dW i«J rf^i 



= 


2lxi'-2- (31 

i = 

CO 


♦ 1) 


= 


^ 1 = z 





= 1. 

Now consider answering the table lookup question y i * I\ The answer to the 
question y i is essentially found at memory locations beginning with cell 3-(i-l), 
except that we have stored the pointer in front of f(c/), and so f(t/) has been 
rlUnlarprl hy \d\ + 1 cpIIs. Thus, the answer tn y ., for i < \d\, is found by reading' 
w(li/|+l+3(i-l)) and then reading either w(lt/l+l+3(i-l)+l) or w(l</l+l+3(i-l)+2). 
When i > |c/|, we need only read the pointer to determine that the answer is 0. 
One possible algorithm to answer the question y i therefore has the memory cell 
access sequences: 

0, 1, ... , lt/l-1, |t/|, |e/|+3(i-l)+l, lt/|+3(i-l)+2 if »i(lrfl+3i-2) = 1, Ic/I > i 

0, 1, ... , lt/l-1, It/I, |e/l+3(i-lW, le/l+3(i-l)+3 if w(l«/l+3i-2) = 0, Ic/I > i 

0, 1, ... , lc/l-1, |t/| if Ic/I < i 

This immediately tells us the total number of accesses made: 
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It/I + 3 if lt/| £ i 

tf^Mt/))] 

It/I + 1 if lt/i < i 



i 



The intuition behind the definition of a pointer representation is that we 
encode the length so that in order to answer a question y, we need only read the 
pointer and can then look up the answer. In the case of the endmarker 
representation, we were forced to actually read the list. The question remains, 
however, why we chose to allow the pointer to encode \d\ rather than \p{d)\. If we 
wish to be able to access individual list elements, as by asking- the questions in T, 
then it is reasonable to encode lc/1. Reading the pointer will then at least tell us 
immediately whether the answer to 7, is or not. On the other hand, if we wish 
to perform the update operation of appending an element to the end of the list, 
then it would be advantageous to know \p(d)\. If 

(V^,t/ 2 C!D)(|e/,|-|t/ 2 l ■> lf(c/,)|-|f(c/j)|) 
then it of course makes no difference whether the pointer encodes \d\ or !f(i/)l, 
since we can determine one from the other. 

Example 5.23. Reconsider the functions f and g from Example S.22 but define a 
partial function P/:iM -* # + such that D(jB') - {2i I i « J}, the even natural 

(-) 
numbers, and *'(n) = 1 2 0. Notice that $.' is a representation. So the 

representation /»':D ■» # + defined by 

P'U) ={«'(|f(t/)l)} u{f(t/)} |d|M 

is equivalent to the representation p in Example 5.22 because 

fi'(lf(c/)l) -i(icfl). 

Technically, however, the representation p' is not a pointer representation because 

*':{?.i I i * 3} ■* S\ whereas the definition requires that jfi:J -* g + . But since 

U>(fi')l * lD(fi)| « |J|, we often find it convenient to loosely refer to p' as a 

pointer representation itself. I 
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We could have written the formal definition of a pointer to allow a mapping 
fi':IN -> 3 + where lD(f/)l = IJI, but we chose not to since the added generality 
would make the definition statement more complex and would not improve our 
results. 

We shall, however, allow one conceptual extension for pointer representations. 
Since we require the pointer and list components, H.{\d\) and f(c/), to be placed in 
memory so as to not overlap, we may want to view them as being stored in separate 
sections of memory. In other words, we could view f{d) as being stored in memory 
as usual and l'.{\d\) as being stored in an auxiliary section of memory, perhaps 
some sort of register. However, we shall not in general want to bound the size of 
the pointer and we do not differentiate between the cost of a pointer access vs. the 
cost of a list access, so it is easier to view the pointer as also being in memory. We 
simply assume the memory manager allocates the list and the pointer separate areas. 
Perhaps they are even interspersed, but we do not want to have to alter our coding 
schemes to take this into account. Therefore for numbering simplicity we may 
choose to allow both the list and the pointer to begin at cell number and just note 
that the representations are separate and therefore disjoint. In this way, the storage 
of fid) in memory does not have to depend on the memory location of £(k/l). 

Definition. Let f be a total function f:D -» 8 , and let fi be a representation 
tf ; J -* # + . Assume that f(c/) and C(lc/|) are stored in separate sections of 

•4* 

memory. Then we refer to a pointer representation p-\D -* 8 formed from 0. 
and f as a separate pointer representation and write 

P {d) =f(c/) U fi(lc/|) 

In order to avoid possible confusion, when we have in mind a separate pointer 
representataion, we shall explicitly say so. 
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Example 5.24. Reconsider the pointer representation p of Example 5.22, but 

assume that the pointer and the list components are stored in separate memory 

sections. So p is a separate pointer representation, and we denote it by 

p{d) =£(\d\) Uf'(rf). 

Certainly we have not altered the storage costs from those of Example 5.22, but it is 

possible to implement each y i in such a way that we decrease the access costs. 

Possible access sequences for y^ * V are: 

0,1, ... .i-1,30-1), 3(i-l)+l if m(3(i-D) = 1, It/I > i 

0, 1, ...,i-l, 30-l),30-D+2 ifm(3(i-D) =0, lc/1 >i 

0, 1, ... , lt/l-1, lc/1 if \d\ < i 

Thus we have for the total number of accesses: 

~i + 2 if lc/1 > i 
Uly^pid))! = < 

lc/1 + 1 if lc/| < i 

Notice that this represents an improvement over the access costs we previously had. I 

Although we shall not in general concern ourselves with the way in which separate 
memory sections are allocated, let us note, in the context of this same example, one 
possible scheme. 

Example 5.25. Let f be defined as in Example 5.22, but now define the 

a. 

concatenation-preserving representation f^D -» 8 by 

Ml 

fjU) = U{(idii))) 4(l . iy 
1=1 

If we view the pointer fij as being "scattered", we may define 

fl,0) ={(3+4j,l) 10 < j<i} U{(3+4i, 0)}. 
Then the pointer representation p^-D -* 8 is defined by 

p x id) ={fi 1 (|c/|)} U{f,(e/)} . 
For instance, 
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/jj(abad) = 0_0110_10010_11 

/»i(bdb) = 10_10_1110_1 

/»jU) =___0- 
Since we are counting only the actual number of cells occupied, the storage has not 

been altered. For y i t T we have the following access sequences: 

3, 7, 11, ... , 3+40-1), 4i-4, 4i-3 if wi(4i-4) = 1, \d\ > i 

3, 7, 11, ... , 3+4(i-l), 4i-4, 4i-2 if m(4i-4) = 0, \d\ > i 

3, 7, 11, ... , 3+4(|J|-l), 3+4lc/| if le/l < i 

This gives us the same total number of accesses as we had in Example S.24, where 
we simply made the assumption that we had separate memory sections. E 

Example 5.25 illustrates an encoding for a separate pointer scheme. Notice that this 
encoding did not affect the order in which memory cell contents were determined; it 
simply altered the memory cell numbers in which this information was found. We 
can show in general that there is no harm in using a separate pointer scheme if it 
makes our coding job easier, because for any separate pointer representation p 
there is a pointer representation p' without a separate pointer that achieves the 
same storage and access costs. 

Theorem 5.11. Civen any pointer representation p with a separate pointer, 
there exists a pointer representation p' without a separate pointer such that 

\p(d)\ = !/»'((/) I for all d I D 
and #[f,( />(</))] =#[f, (/>'(</))] for any operation f, 

Proof: Suppose the representation p-T) -* 8 has a separate pointer and is defined 
by 

p(d) =f(e/) U £( |t/|). 
We can define a representation p'--D -> 8 without a separate pointer by 

p'{d) =f'U) U £'(|rf|) 
where f'(d) = {(2n, w(n)) I n € D(f(c/))} 
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and C'dc/I) = {(2n + 1, w(n)) I n * D(C(|c/l))}. 

Since Z)(f '((/)) fl Z>U'( |c/|)) = #, 

it is clear that \p(d)\ = \p'{d)\ for all d t ID. Also, any access sequence to perform 

an operation f, using p' can be mapped in an obvious way to an access sequence to 

perform f t using representation p. I 

Recall that no endmarkcr representation can achieve Kraft storage for finite D. 
This is not the case for pointer representations, as the following example shows. 

Example 5.26. Let X = {a,b}, 8 = {0,1}, and D = U xK Define the 

i£{0, 1,2,3} 

function f:X -> a* by 

f(a) =0 
f(b) =1 

and the concatenation -preserving function f'ilD -* 8 by 

M 

rid) = U{fU(i))} M 
1=1 

Let the pointer ^{0,1,2,3} -» 8* be defined so that 

1(0) =00 

R(l) =01 

l{2) =10 

fi(3) =11 

Then we define the representation p-D -> 8* by 

P {d) = {e(k/l)} u{rU)} 2 

For instance, 

p{\) =00 

/3(a) =010 

p(b) =011 

/j(aa) = 1000 

/>(abb) =11011 

The representation /? achieves Kraft storage, because 
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c/«ID i*J MX 1 

3 

= 22 1 - 2" (i + 2) 
1=0 

= 1. J 

So we know from examples 5.22 and 5.26 that a pointer representation may achieve 
Kraft storage for ID infinite or finite. 

Let us try to determine under what conditions a pointer representation p does 
achieve Kraft storage. The following theorem shows that the pointer .0. must itself 
achieve Kraft storage in order for p to achieve Kraft storage. 

Theorem 5.12. Let f be a total function f:ID -* # + and let t.J -> £ + be a 
representation which does not achieve Kraft storage. Then the pointer 
representation p'-V) -» 8 , where 

pU) -{f(c/)} ni(d) U{fi(lc/|)} n2(d) 
does not achieve Kraft storage. 

Proof: Since C is a representation which does not achieve Kraft storage, 

2 \8\ < 1. 

K-J 
We first show that the theorem holds for a separate pointer representation 

A. 

p'--\D -+ 8 , where 

p'{d) = i{d) UJ>{\<1\). 
Assume that the representation p' does attain Kraft storage. Then 

1 = Z ( \8\ I \8\ ) 

Thus, there exists k <- J such that 

z isi >i. 

So f is not a representation and there exist d v d 2 t X k such that f(c/j) arid f(c/ 2 ) 
are indistinguishable. But since |t/j| = \d z \ = k, 



122 



/»'(c/j) =f(c/ 5 ) U £(k) 
and p'{d^ =f(c/ 2 ) U fi(k) 

are indistinguishable, contradicting the fact that p' is a representation. Thus, p' 
cannot achieve Kraft storage if G doesn't. Since 



l/»'U)l = lf(rf)l + l«(ldl)l -I/jU)!, 

2 I. 

c/<ID 



then 2 131 *1 



implies that 2 lal * 1. 

J* ID 
So the pointer representation /? cannot achieve Kraft storage if fl doesn't. 



Thus, the pointer P. achieving Kraft storage is a necessary, although certainly not 
sufficient, condition for the pointer representation p to achieve Kraft storage. 

We frequently consider a pointer representation formed from a 
concatenation-preserving function f and a pointer A. We now show that whenever 
that concatenation-preserving function f is based on a function f'-X -* 3 which 
itself is a representation and attains Kraft storage, then the pointer representation p 
also achieves Kraft storage, assuming, of course, that the pointer S. achieves Kraft 
storage. 

Theorem 5.13. Let ID = Ux' arid consider a representation function 

f:A' -* 8 f which achieves Kraft storage. Let f':ID -» # + be a 

concatenation-preserving function formed from f and defined by 

Ml 

rid) = U{fU(i))} n(d) . 

If the representation A: J -* 8 attains Kraft storage, then the pointer 

X 

representation p'-D -> 8 also achieves Kraft storage, where 

/<rf)-{f'(rf)} niW )U{i(lrfl)} B8W 
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Proof: Since \p(d)\ = If '(d) I + |f.(lc/l)|, 

2isf lp{d)] -I 2ia- (r(rf,, + ,£(1,,) 

V,, r'^OI v , -If (e/)L 
= 2(ISI ■ Z ISI )• 

Since f achieves Kraft storage we can make use of Lemma 5.4, which gives us 

= 1 I 

We now want to determine the conditions, if any, under which a pointer 
representation can achieve Kraft access for the set V of table lookup questions and 
also achieve Kraft storage. 

Theorem 5.14. Let ID = U A' 1 . If a pointer representation p-'D -> # + 

achieves Kraft storage and also achieves Kraft access for all y i $ T, then 
ISI = 2 and ID = {x} U A' n for some n < IN + . 

Proof: Theorem 5.12 guarantees that if p achieves Kraft storage, then its pointer 
function P.:J -> 8 must also achieve Kraft storage. Since \8\ > 2, it must be the 
case that IJI > 2. Thus, ID * A' n . Recalling theorems 4.9 and 4.10, we know that if 
a representation p achieves Kraft storage and Kraft access for all y i t T, then 
ID = A' n or ID = {x} U A' n . Since the former is not true, the only possibility is that 
ID = {x} U X n . So if we are to achieve Kraft storage and access at all, then IJI = 2 
and therefore \8\ = 2. 5 

Theorem 5.14 simply says that if a pointer representation is to achieve Kraft storage 
and access, then \8\ = 2 and ID = {x} U A' n . It does not necessarily say that it is 
possible to ever achieve both. The following example shows, however, that it is 
possible. 
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Example 5.27. Let X = {a,b,c}, 8 = {0,1}, and ID = {A} U X 3 . We want to 

construct a pointer representation p which achieves Kraft storage and also achieves 

Kraft access for V = (7j, y 2 , y 3 }. To do so, we first define a function f'-X -* 8 '• 

f ( a) = 00 
f(b) =01 
f(c) =1_ 

We then let f':ID -* 8 be the concatenation-preserving function formed from f: 

Idl 

f'U) = U{fU(i))} 2(M) . 

i*l 
The pointer function R-J -> 8 is defined by 

.0(0) =0 
0(3). = 1 

Then the pointer representation /?:!D -» S can be defined by 



p(d) ={.C(lc/|)} U{f'U)} r 



The representation p achieves Kraft storage, because 



_. -\p(d)\ ~ -(ir(c/)i + ic(ic/i)i) 

2. <- = Z 2 



.1- 2 2- |rU)l 

c/e{\,X n } 

= 2- 1 (2-° + 2" 3 + 6-2- 4 + 12-2- 5 + 8-2" s ) 

= 1. 
We can construct access trees for 7 } , 7 2 , y 3 as shown in Figure 5.11. By 
observation, each achieves Kraft access. I 

This example can be generalized, giving us the following theorem. 

Theorem 5.15. Consider any domain of the form ID = {x} U X n , \X\ > 1, 
and assume that \8\ = 2. Then there is a concatenation-preserving pointer 
representation p-\D -* 8 which achieves Kraft storage and for which it is 
possible to implement the table lookup questions V = {y l I 1 < i < n} so that 
each 7, achieves Kraft access. 




a b 





Figure 5.11. Access trees for y v y z , y 3 of Example 5.27. 

Proof: The construction is like that in Example 5.27. We first define a function 
f:X -* 8 f such that f achieves Kraft storage. It is possible to do this since there 
exists n, « IN such that \X\ = ( 131 - 1) • n j + 1 = n l + 1. A tree T for f has n, 
internal nodes, for which we can choose labels from the set {0, 1, . . . , n - 1;. We 

now define the concatenation-preserving function f':ID -♦ 8 formed trom f: 

Ml 

rU) = U{f(c/(i))} n0 . a> 

The pointer function fi:J -» 8 is defined by 

C(0) =0 
R(n) =1 

A, 

From these we define the pointer representation p-\D -» 8 '- 

P {d) ={fi(lc/i)} u{r(c/)} r 

By Theorem 5.13, since f and fi achieve Kraft storage, so does p. Also, if T is the 
full tree corresponding to f, then the access tree for any y { $ V is of the form 
shown in Figure 5.12. Thus, p achieves both Kraft access and Kraft storage. I 



We have seen by Theorem 5.14 that only for 131 =2, ID = {A} U A' n can a 
pointer representation achieve Kraft storage and access. Let's try to see when it is at 
least possible to achieve Kraft access. The following example presents such a 
scheme, but the resulting pointer storage cost is high: |J| - 1. 
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* 0"}i +ni (,-i) 

Figure S.12. Access tree for y i in proof of Theorem 5.15. 

Example 5.28. Let S = {0,1}, A' = {a,b,c,cl}, and ID = U A 1 for J = {0,3,5,6}. 

Let i-X -» # + be defined by 

f(a) =00 

f(b) =01 

f(c). =10 

f(d) =11 

and define from f the concatenation-preserving representation f:ID -» 5 so that 

Ml 

f'(tf) = U{f(e/)} 2fl . n 

1=1 

(a) Define a length function fi^.D -* # + by 

A,i\d\) = iMI in-MI = ildlgo-ldl 

A, 

Then a pointer representation /^ID ■> # can be defined by 

/>,(«/) ={f'(c/)} 6 U{i'. 1 (|c/|)} . 
Notice that \p x id)\ = lf'(c/)l + 6 = 2-lc/l + 6. 

Since flj docs not achieve Kraft storage we know that p x does not either. On the 
other hand, we can implement each 7, * T so as to achieve Kraft access. We do 
this by first reading the i lh bit of fljUc/l). If mil) = 0, then we know \d\ < i and 
so y { ipid)) = 0. On the other hand, if mii) =1 then we know y^ipid)) * 
and we look in locations 2(i-l) + 6 = 2i + 4 and 2i + 5 in order to determine 
y lipid)). Thus, each 7, can be implemented by an access tree as shown in Figure 
5.13. 

(b) Recalling Theorem 4.14 leads us to try to find a length function & 2 , where 
\P- 2 i\d\)\ = |J| - 1. Since we know, for instance, that X 4 £ ID, then 
y A ipid x )) = if and only if y 5 ipid^)) = 0. So we define the length function 
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Figure 5.13. Access tree for y. of Example 5.28a. 



f. p :!D -» 8 f by 



P. 2 {\d\) 



[ 



000 


if It/I 


= 


100 


if It/I 


= 3 


no 


if lc/1 


= 5 


111 


if It/I 


= 6 



- + 

Then the pointer representation p 2 : \D ^ S is defined by 

p 2 {d) ={f'(J)} 3 U{£ 2 (|rf|)} . 
Once again, p z cannot achieve Kraft storage since l. z doesn't. But we can 
implement each y i * T so as to achieve Kraft access, as shown in Figure 5.14. 
Notice that, for all d * ID, \pj.d)\ > 3 = Ijl - 1, as required by Theorem 4.14. i 



y \>y zifs 



Y 4 ,7 




4> ' 5 



b C 





Figure 5.14. Access trees for all y, * T from Example 5.28b. 



The method used in Example 5.28b can be generalized so that it is always possible, 
when \8\ = 2, to construct a pointer representation that achieves Kraft access. 
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Theorem 5.16. Let ID = LU' 1 , and let \8\ = 2. Then it is possible to 

construct a concatenation -preserving pointer representation p-\D -» 8 such 
that each 7, ( T can be implemented so as to achieve Kraft access. 

Proof: Construct some representation f-X -* 8* such that f achieves Kraft storage. 

Since \8\ = 2, it is always possible to do this; f corresponds to some full tree T. Let 

j. 
k - max|f(x)| and define the concatenation-preserving function f':ID -* 8 by 

\d\ 

rid) = LKfCc/CO)}^.!) 

We define the length function tj •> # + in such a way that \R{i)\ = IJI - 1, for all 
i <■ J. First, index the elements in J so that J = {i , ij, i 2 , • . • }, vvhere ij < ij +1 . 

Then define 

, , n |J|-l-n 

&U R ) =1 o 

The separate pointer representation p-'D -> 8 defined by 

p{d) =f'(d) U &{\d\) 
can be implemented so as to achieve Kraft access. For instance, y, £ T can be 
implemented as follows. Determine the least value i n t J such that j < i n . Then an 
access to cell n-1 of the pointer indicates whether or not fXp(d)) = 0- 

wi(n-l) =0 * 7 j( /»(«/)) = # 
and w(n-l) =1 =*• 7j(/>(c/)) *0 

If 7,(/>(c/)) ?" 0, then we can go to cell k(j-l) of the list function f'(t/). Figure 
5.15 illustrates an access tree for 7,, where the nodes of T correspond to memory 
cells of the list component f(c/). E 

Although the pointer representation constructed in Theorem 5.16 can achieve Kraft 
access, this is at a potentially very high storage cost, since for all d $ !D, 
\p(d)\ > Ijl - 1. Unfortunately, by Theorem 4.14 we know that we cannot 
uniformly improve this storage. In other words, if we insist on Kraft access for all 
7, * I\ then we are stuck with \p{d)\ > IJI - 1. 
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Yj Note: cell n-1 is in 

(Q? pointer representation 

H'Wn 

Figure 5.15. Access tree for 7, ( T in the proof of Theorem 5.16. 

Theorem 5.15 presented a method for constructing a pointer representation so 
as to achieve Kraft access, but it was only for the case 151 = 2. This leads us to 
wonder whether it is possible to extend the result to \8\ > 2. The following theorem 
shows that, for \8\ > 2, it is not possible with a pointer representation to implement 
each Y| ^ r so as to achieve Kraft access, unless the pointer component is 
"superfluous". 

Theorem 5.17. Let \8\ > 2 and consider a function f:D -» # + . Let f-D -» # + 
be a pointer representation 

p(J) ={(d) U H\d\), 
where P. is a representation P.-J -> 8 . If f is not by itself a representation of 
ID, then p does not achieve Kraft access for all y l $ T. 

Proof: Let the function f not be a representation, and assume p does achieve Kraft 
access for all 7, ( I\ Since f is not a representation, there exists Y k £ T such that 
the access tree for Y k , T, has an internal node labelled r * D( C(lc/I)). By Theorem 
4.1 and Corollary 4.2.1, since y k achieves Kraft access, it has U'l + 1 leaves with 
distinct labels from the set X U {0} (or Ul leaves if ID = X n ) and the node r has 
161 blanches. Let one of the branches from node r eventually lead to some leaf 
labelled x, £ X and another branch from r eventually lead to a leaf labelled 
x 2 .t X. There is some c/j ( ID such that c/j(k) =Xj, r* {CY k ( pid^) )]}, and 
r < ID(f.(|J 1 l)) where w,(r) =bt8form i 2 pidj. Let 

t/ 2 = {(n.Jjdi)) I 1 < n < lc/,1, n * k} U {(k,x 2 )}. 
In other words d z differs from d x only in its k lh element. By the definition of ID, 
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</j <- ID imples that J 2 * ID. Then m 2 (r) = b' t g, where b' * b, for 
m z 2 p{tf 2 ). Since lc/,| = |c/ 2 l, fidc/J) = «(|t/ 2 l) and so »i,(r) =w 2 (r), since 
r « ID( /!( |t/j|)). This gives a contradiction. Thus, p cannot achieve Kraft access 
for all Yj ( T. I 

Thus, if a pointer representation achieves Kraft access for all 7, * T, then the list 
component f was itself a representation and so we need not have stored any pointer 
at all. Effectively, this says that it is impossible for all 7, * F to achieve Kraft 
access with a pointer representation in which the pointer is in fact needed to store 
length information. Certainly, it is not possible for a concatenation-preserving 
pointer representation to achieve Kraft access, since a concatenation-preserving 
function f is not a representation (except in the trivial case where X k £ ID for 
k > 2). 

Corollary 5.17.1. Let D=Ux\ where max 1 > 1. If 151 > 2, no 
concatenation -preserving pointer representation can achieve Kraft access for all 

7i< r. 

We have seen that for a pointer representation we in general cannot hope to 
achieve Kraft access. On the other hand, we know that we can actually achieve 
Kraft storage. So let us discuss how well we can do for access costs if we insist on 
Kraft storage. This is the approach we take for the rest of this section, and we 
shall see that pointer representations can, in fact, be quite efficient in terms of 
access as well as storage costs. 

Recall the pointer representation scheme used in Example 5.24. Since the list 
component f had fixed position fields, we could immediately (and with Kraft 
access) determine the answer to any 7, i T, as soon as we knew the answer was not 
0. So in order to answer a table lookup question 7 p we read enough of the 
pointer to know whether or not lc/| > i. Since the pointer function P.-J -» # + was 
defined by P.(\d\)) = l |d| 0, this meant we had to read i bits of the pointer for 
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\d\ > i and lt/| + 1 bits of the pointer for It/ 1 < i. We shall present a scheme to 
reduce the length |C(li/l)l, which therefore reduces the cost of accessing the pointer. 

For l/j'l = 2 we saw in Example S.22 the pointer representation, where 
C(\\) = l n 0; this is essentially a unary representation of n followed by an 
endmarker. It would, of course, be desirable to somehow represent n in binary, 
which would descrease the storage cost but would generate the problem of detecting 
the end of the string; i.e., wc need some way to guarantee that li is a 
representation. Since D[ ('■) = IM , we use a universal encoding method as described 
by LLlias [71 In this scheme we successively encode, in binary, the length of the 
result of the previous encoding. For instance, we could represent It/I as a binary 
string s, which would have length |s|~ log 2 lc/|. If we were to use, say, a unary 
encoding to specify Isl, then we could write G : (!c/I) = 0''''ls, which gives us 

|r(|c/|)| = 2-|sl + l * 2-log 2 lt/| + 1, 
an improvement for large \J\ over our previous scheme's cost, where we had 
|f(l(/l)l = It/I + 1. In the following example we present an encoding scheme for the 
case 151 = 2. 

Example 5.29. Recall the fixed position field concatenation-preserving function f 
from Example 5.22. Our concern here is with finding an efficient length 

CO 

representation C-J -> 8 . Assume for simplicity that ID = UA' 1 . Rather than 

i = 

defining P.( It/I) = l' d '0, as we did in Example S.22, consider representing It/ 1 = n as 
a binary string as follows: 

n h(n) 



1 

2 1 

3 00 

4 01 

5 10 

6 11 

7 000 

8 001 
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More formally, we can define h:IM -* S* by letting h(n) be the binary 
representation of n + 1, with the leftmost symbol deleted. For example, to 
determine h(21), we write 22 in binary, 10110, and then delete the leftmost symbol 
(always a 1): h( 21) = 0110 (see Table 5.1). Notice that 

lh(n)| = Llog 2 (n+l)J. 
We now define a pointer representation i 5 :lM -> S* by 

^(n) =0 |h(n)l -l-h(n), 
as also shown in Table 5.1. The storage cost for the representation S. is 

I C 1 ( n ) I = 2-lh(n)l + 1 = 2-Llog 2 (n+l)J + 1. 
We can show that the representation R achieves Kraft storage by noting that, for 
each j <- IN , Llog 2 (n+l)J = j for 2 J consecutive values of n: 

l2 _lfil(n)l y 2 -^ Uo ^ n ^ )] + 1) 

n=0 n=0 

to 

= l2 j -2 _(2j+1) 
J=o 

= 1. 

Thus, a worst case access cost to determine whether or not y^id) = is just 

2-Llog 2 (n+l) J + 1, an improvement over the scheme in Example 5.22 (or Example 

5.24), which had a worst case of n + 1. In general, we can expect to do even better 

than this, reading only as much of the pointer as necessary. Because only two 

accesses of the list representation are required to read the answer y^d) for this 

particular example, we have the following access costs: 

r , , riog ? (n+2)l 

riog 2 ( i+2)l + 2 for i < 2 2 

, riog-(n+2)1 - 1 riog ? (n+2)l 

UZy^d)! = «{ 2-Llog 2 (n+l)J +3 for 2 2 < i < 2 52 -2 

riog?(n+2)l 
riog 2 (n+2)l for i > 2 2 - 1. 

Using the same trick, over again, we can encode the length of the length of n 

by defining the pointer representation -C :'N -* 3* by 

/! 2 (n) =0 |h(|h(n)1)1 -l-h(|h(n)|)-h(n), 

giving a storage cost of 
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1 



3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

IS 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 



h(n) 





1 

00 

01 

10 

11 

000 
001 
010 

on 

100 
101 
110 

111 

0000 
0001 
0010 
0011 
0100 
0101 
0110 
0111 
1000 
1001 
1010 
1011 
1100 
1101 
1110 

mi 

00000 
00001 
00010 



g'(n) 



hdh(n)l) 



fi a (n) 



1 




1 


010 





0100 


Oil 





0101 


00100 


1 


01100 


00101 


1 


01101 


00110 


1 


OHIO 


00111 


1 


01111 


0001000 


00 


00100000 


0001001 


00 


00100001 


0001010 


00 


00100010 


0001011 


00 


00100011 


0001100 


00 


00100100 


0001101 


00 


00100101 


0001110 


00 


00100110 


0001111 


00 


00100111 


000010000 


01 


001010000 


000010001 


01 


001010001 


000010010 


01 


001010010 


000010011 


01 


001010011 


000010100 


01 


001010100 


000010101 


01 


001010101 


000010110 


01 


001010110 


000010111 


01 


001010111 


000011000 


01 


001011000 


000011001 


01 


001011001 


000011010 


01 


001011010 


000011011 


01 


001011011 


•ooooinoo 


01 


001011100 


000011101 


01 


001011101 


000011110 


01 


001011110 


000011111 


01 


001011111 


00000100000 


10 


0011000000 


00000100001 


10 


0011000001 


00000100010 


10 


0011000010 



Table 5.1. Construction of pointer representations & x and & , for 161 = 2. 
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|C 2 (n)l =2-|h(lh(n)l)l + lh(n)l + l 

= 2-Llog 2 (L1og 2 (n+l)J + 1)J + Llog 2 (n+l)J + 1, 

and, as for fl 1 , it can also be verified that the pointer representation £. achieves 

Kraft storage: 

* -|« 2 (n)| _ * -(2.Llog 2 (Llog 2 (n+l)J + l)J + Uog 2 (n+l)J + l) 

n=0 n=0 

CO 



_ 22 -(2-Llog 2 (j+l)J + l + j)_ 2 +j 



J=o 



- -(2.Uog 2 (j+l)J + l) 



j»o 

= 22 

j=o 

= 1. I 

The pointer representation construction procedure presented in Example 5.29 

can be applied indefinitely, encoding the length of the length of the length of n, 

etc. It can also be extended to the case where \8\ > 2. In order to do this, we make 

use of a mod-lSl successor operation, © , on strings. We define 0, , so that, e.g., 

Ifil \8\ 

© 2 corresponds intuitively to addition base 2 with the leftmost 1 deleted: 

© 2 1 = 1, 1 ffi 2 1 = 00, 00 © ? 1 = 01, . . . , 11 © 2 1 = 000, . . . 

For © 3 we would obtain the sequence of strings 

1, 2, 00,01,02, 10, ... , 22, 000, 001, ... 

Definition. Consider a binary string 

s=s |sf s N-f •■•• s 3V s i<#* 
and let 

k = min{i I s, * |#| - 1}. 
(If s, = LSI - 1 for 1 < i < Isl, by convention we have k = ls| + 1.) Then we 

define s' = s © 1 

\8\ 

by 
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' = < s„ + 1 







for 1 < i < k 
for i = k, k < Isl 

for k < i < Isl 

for i = k, k = Isl + 1 



So |s'| = Isl except when s = (I#|-1} N , in which case s' = {0} |s|+1 and Is'l = Isl + 1. 
We can now define a function h as a l#|-ary string representation of a natural 
number n. 

Definition. For |#| > 2, let h (n) be the encoding of n * IM as a lol-ary 

lol 
string, h :IM -* #*, where 

h (0) = X 

\8\ 
h (1) =0 
\8\ 
h.,(n + l) = h ,(n) e, , 1. 
I#l ISI \8\ 

For any string b < 8*, for |#| = 1 we by convention define 

h,(6) &b. 

We extend our notation and write h **Hn) to indicate k + 1 applications of 

\8\ 
h 



1/51 



h^'fn) £h|\(|h (n)|). 

151 \8\ \8\ 



Where the particular |#| we are considering is clear, we may simply write h rather 
than h For instance, the function h in Table 5.1 corresponds to h 2 . Notice that 
1- h 2 (n+l) = 1- h 2 (n) + 1, where the addition is in base 2. 



Example 5.30. For ISI = 3, Table 5.2 illustrates h 3 (n) and h §(n). To see how we 
can use the above definitions to determine h 3 (n), assume we know h 3 (ll) = 21. 
So 

h 3 (12) =h 3 (ll) ffi 3 l =21 ® 3 1. 
Letting s = 21 = s 2 s,, then min{i I s, * 101-1} = 1 and so 
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r.,-1 



for i = 1 
for l = 2 



'hus, 



s' =h 3 (12) = 22. 



Similarly, h 3 (13) = h 3 (12) © 3 1 = 22 e 3 1, 

and for s = 22 = s 2 s,, then min{i I s, * 2} = 3 = Isl + 1 and 

h 3 (13)=000. 
Using the above notation we have, e.g., 

h ij(n) = h *(lh 3 (n)l) = h 3 (lh 3 (lh 3 (n)l)l) 
Notice that |h 3 (n)l =0 for one value of n, lh 3 (n)| =1 for three values of n, 
lh 3 (n)l = 2 for nine values of n, etc. " 

In general, since Is © II = Isl except for s = {I3I-1}' 5 ', we note thar the above 

\S\ 
definitions, by design, give us the following lemma. 

Lemma 5.6. For 131 > 1, there are |£f values of n « IM such that 

Ih ,(n)l =r. 

1/31 

Lemma 5.6 immediately allows us to show the following. 



Lemma 5.7. Let n * IM. For \8\ > 2, 



For 



Ih, ,(n)l = Llog, ,(ISl-l)(n+l)J. 
\8\ \8\ 

IhjO-i)! = n. 



Proof: For \8\ = 1, llijdi)! = n by definition. So consider \8\ > 2. Since Lemma 

5.6 tells us that there are I6'l r values of n « IM such that Ih, (n)l = r, then we know 

ISI 

r 

there are 2l#|' values of n such that Ih, (n)l < r, and 
i-o ISI 
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n 


h 3 (n) 


i» 


h 3 (lh 3 (n)l) 


E^inl 


o 




2 


- 


n 


1 





020 





0200 





1 


021 





0201 


3 


n 


022 





0202 


4 


00 


1200 




02100 


5 


01 


1201 




02101 


6 


02 


1202 




02102 


1 


10 


1210 




02110 


8 


11 


1211 




02111 


9 


12 


1212 




02112 


10 


20 


1220 




02120 


11 


21 


1221 




02121 


12 


22 


1222 




02122 


13 


000 


002000 


2 


022000 


14 


001 


002001 


2 


022001 


IS 


002 


002002 


2 


022002 


16 


010 


002010 


2 


022010 


17 


Oil 


002011 


2 


022011 


18 


012 


002012 


2 


022012 


19 


020 


002020 


2 


022020 


38 


221 


002221 


o 


022221 


39 


222 


002222 


2 


022222 


•10 


0000 


0120000 


00 


12000000 


41 


0001 


0120001 


00 


12000001 


42 


0002 


0120002 


00 


12000002 


43 


0010 


0120010 


00 


12000010 


44 


0011 


0120011 


00 


12000011 


120 


2222 


0122222 


00 


12002222 


121 


00000 


10200000 


01 


120100000 


1 no 


00001 


10200001 


01 


120100001 



ible 5.2. Construction of pointer representations & and S. , for \B\ - 3. 
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k 

h, ,(n)| = min{k In < 2 I5l' -1} 
151 1-0 

= min{k I n + 1 < ^r^ 1 } 

= min{k I (151 -l)(n + l) < I5l k+1 - 1} 

= Llog, (ISI-l)(n + l)J 

151 



2 



We now define our class of pointer representations, extending the P. and £ 

of Example 5.29. For 151 = 2 we want: 

tf 2 (n) = |h(n)l -l-h(n) 

«f(n) =0 |h(|h(n)l)l -l-h(lh(n)l)-h(n) 

fi 3 (n) = |h(|h(|h(n)l)l)| .i-h(lh(lh(n)l)l)l-h(lh(n)|)-h(n) 

Notice, however, that for \8\ > 2 the first component of R ' , 0' h (n) '- 1, can in fact 

161 
be encoded in base 151 - 1, leaving one unused symbol to serve as the endmarker. 

So we can formalize the class of pointer representations as follows. 

Definition. Let \8\ > 2. We can define a class of pointer representations fl 1 , 
for i > 0, as follows: 

t « < " ) = ! Vi- 1 (ih Bi (n,i) ' (i8| - i),! = u< H !h ; 8 , (n)! » 1 ) 

k 

nj =l + |h p (lh k (n)l)! + I Ih'(n)| 
151-1 j=i + i 



where 



Therefore 

C k (n) =11 (|h|V | (n)|)-(lSl-l)-h| t ,(n)-hf- 1 (n)-...-hf (n)-h ,(n) 

151 151-1 151 I5i 151 151 151 

Example 5.31. We now verify that the definition behaves as we would like for 

151 = 3, writing h to mean h 3 . In particular, 
A J(n) =h 2 (|h(n)|)-2-h(n) 
C|(n) =h 2 (|h(lh(n)l)l)-2-h(lh(n)|).h(n) 
R 3 (n) = h 2 (|h 3 (n)l)-2-h(lh(|h(n)l)l)-h(lh(n)|)-h(n) 

3 

Thus, C 3 (n) ={h 2 (lh 3 (n)l)-2} U(U{h 1 (n)} n ) 

1=1 3 
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where rij = |h 2 (lh *(rr)|)| + 1 + Ih *(n)| + Ih g(n)l 

n 2 =lh 2 (lh§(n)l)l + l + lh |(n)l 

n 3 =|h 2 (lh |(n)l)i + 1 
So 3 3 (n) = h 2 (|h 3 (n)|)-2-h J(n)-h |(n)-h J(n). 5 

The length U * ,(n)| should immediately be clear. 
Ifll 

Throrem 5.18. Let 151 > 2, k > 1. Then 

k 

U* (n)l = |h (Ihf „(n)l)l + l + 2lh * ,(n)|. 

\S\ l#l-l 151 j-i \8\ 

While the exact numerical value of Hog k ,(n)| can be obtained by substituting 

151 
Hog (151-1) (n+1) J for Ih (n)| in the expression in Theorem 5.18, we can see 

that we essentially have: 

Ifi^nJI *2logn, 

|C 2 (n)| * logn + 2loglogn, 

I C 3 ( n) I * log n + loglog n + 2logloglog n. 
In any case, we can make the following statement. 

Corollary 5.18.1. Let 151 > 2, k > 1. Then 

l f . |a (n)l=0(,o V ). 

We can now show that each of the pointer representations £ ' achieves Kraft 

151 
storage. 



k 



Theorem 5.19. For 151 > 2, k > 1, each of the pointer representations £ 
achieves Kraft storage: 

n=0 

Proof: The proof is by induction on k. Once again, we write £ to denote £ and 

151 
h to denote h . 
151 
Basis: For k = 1, 



P.\ 



-140 



00 I nl I \ I °° 



Vl -ir(n)l ^ -(Ihifl ,(lh(n)l)l + l + lh(n)l) 

2. 151 = 2. ISI m l by Lemma 5.8 

(lh|fl|_i(r)l + l + r) , r 

m l -\8\ by Lemma 5.6 



n=0 n=0 

oo 



r=0 
oo 



= '2ia" lh| «-i (r)l 

Id, 



'r=0 

oo 



= T i r 2lSr J -(lSl -1) J by Lemma 5.6 

= 1 
Induction step: Assume the result holds for k; i.e., assume that 

n = 

Then 

~ -|P. k+1 (n)l ^ , -<|h, , ,(lh k+1 (n)l)l + 1 + zV(n)l) 

I\8\ = I\8\ 161-1 

n=0 n=0 



y ,-(lh|g|.i(lh k (lh(n)|)|)| + 1 ♦ |h(n)| + 2lh 1+1 (n)l) 

n = 

= ii 5 r (lh ^-i (lhk(j)l)l + 1 + j + 2lhl(J)l W 

= iisr (lh|sl - l(lhk(j)l)l + 1 + 2lhl(j)l) 
f- k (j)i 



= I \8\ 



= 1. 



Since each of the pointer schemes l} achieves Kraft storage, it follows from Theorem 
5.13 that a pointer representation which uses ft [ also achieves Kraft storage if the list 
component is storage efficient. 

Corollary 5.19.1. Consider a separate concatenation-preserving pointer 
representation />:ID -* 8 defined by 

Icfl 



p{d) =U{f(c/(i))} n(d) U{£ k (lc/1)}. 
1 = 1 i lol 
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If f achieves Kraft storage, then p also achieves Kraft storage. 

So we have presented a pointer encoding scheme which allows us to represent 
lists of unbounded length and also achieve Kraft storage. Consider a fixed position 
field, separate, concatenation-preserving pointer representation p, and let us see 
how well one can do for access. We already know, of course, that we cannot 
achieve Kraft access. So suppose we want to answer some table lookup question 
7, * T. We can do this by reading the pointer in order to determine whether or 
not |(/| > i. If it is not, then we immediately return the answer 0. If it is, then we 
go to the appropriate memory location to read the answer. So at worst we need to 

make 

i 

l«!.,(lc/l)l*log' le/|+ SlogJ lc/| + k 

\B\ \8\ j-i 101 

accesses, where k is some constant depending on the function f, at most the size of 
a field n r We can often do even better by only reading enough of the pointer to 
determine if lc/| > i, but, of course, for \d\ = i we would be forced to read 
|fi(lc/l)l + k = 0( loglc/l) cells. We shall discuss this encoding in the context of stacks 
in Section 6.4. 
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CHAPTER 6 

STACKS 

In Chapter 1 we discussed what we mean by a stack, a linear list for which all 
insertions and deletions are made at the top of the stack. Much work has been 
done to obtain formal specifications of the stack as a data type (see e.g., Liskov and 
Zilles [18], Lehman and Smyth CH3), but such a formal definition is unimportant 
for our purposes. Any scheme which captures our intuitive notion of a stack would 
suffice. It is our goal to. apply some of the techniques we have thus far developed 
to analyze some stack implementations in terms of Kraft storage and access. We fust 
define the basic stack operations and in the following sections we examine 
endmarker and poionter stack representations. Table 6.3 at the end of the chapter 
summarizes some of the lower bound results. 

6.1 Stack Operations 

While there are various operations we might wish to consider, any stack 
implementations will have PUSH and POP operations. These are presumably the 
only update operations that we shall want to perform on a stack. We also want 
some way to read elements in the list; we at least need to be able to read the top 
stack element. So we begin by formally defining these three stack operations: 
PUSH, POP, TOP. 

Viewed in the problem domain, a PUSH operation causes a new value in X to 
be inserted at the top of the stack, thereby increasing the stack length by one. So a 
PUSH is a pure update, provided the stack can grow indefinitely. Where the 
memory size L is bounded, some sort of "Error" statement must be returned if an 
attempt is made to PUSH a value onto a stack which has no room to grow. Thus, 
we define a PUSH operation to consist of both a question and an update. 



-143 



Because we are considering, only domains of the form D = U {X } and a 
PUSH operation will cause a stack to increase in size by one, it makes little sense to 

consider domains where i, i + 2 * J but i + 1 <£ J. So for simplicity we shall 

L 
henceforth assume that ID = U{x'}, where L may be infinite. 

1 = 
For the problem domains we are considering, if b € X and b 4 iD, then 

A' 1 c O. So any value in A' can be pushed onto a stack at any time, and there are 

in general \X\ different PUSH operations. The following definition states more 

formally what we mean in the problem domain by a PUSH operation. 



Definition. In any problem domain ID = U{x'}, we define the class of 

1 = 

PUSH operations 

FpUSH = < f PUSHx ' x * * '» 
where each PUSH operation f PUSHx consists of a question component and an 

update component: 

fpUSHx = ^PUSHx' U PUSHx' - 

For any d * ID, 

~at if Irfl < L 



itushx 



(d) = < 



Error 



if \d\ = L 



and 



"pushx^ = > 



JU{(lrfl,x)} ifqpnsHx^) =0 



else 



If L is infinite, then any finite stack is allowed, and we always have 



'PUSHx 



U) =0 



UpnsHxM =c/U{(lc/|,x)} 



and so we can view a PUSH operation as a pure update. 
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Similarly, a POP operation also consists of a question and an update portion. 

A POP causes the top stack element to be removed; i.e., the stack length is 

decreased by one. If the stack length is already empty, however, then its length 

should not be decreased and some sort of "Error" must be returned. 

L 
Definition. For any problem domain D = U{x'} and any d * ID, we 

1 = 

define a POP operation f pop by 

fpOP = '^POP' U POP'> 



where 



and 



if lc/1 > 
if |c/l = 

u pop U) = {(n,d(n)) 10 <n < |c/l - 1}. 




Note that u p0P U) = when \d\ = (as well as when \d\ = 1). We have defined 

the POP operation to be a pure update when \d\ * 0. 

We read the stack via the top element, using the operation TOP. Since the 

stack state is not altered, u (c/) = d and TOP is defined as a pure question. 

TOP 

L 

Definition. For any problem domain ID = UU 1 } and any d t ID, we 

1 = 

define the TOP operation f T0P as a pure question: 

fddc/l-l) iflc/|>0 



f T0P (c/) = q T0P (c/) = 



Error if \d\ = 



We might have chosen to define a POP operation so as to return the value 
which it deletes from the top of the stack. Instead, we define another operation, 
TPOP, to serve as a combination TOP and POP operation. 



145 



L 
Definition. For any problem domain D = U{x'} and any d * ID, we 

1 = 

define the TPOP operation f TP0P by 

*TP0P = ^TPOP' U TPOP^> 

where ^pop^ = ^top^ 

and u Tpop (c/) = u pop (c/). 

So TPOP returns an "Error" message precisely when \d\ = 0. In general, we choose 
to discuss separately the component TOP and POP operations and only occasionally 
make reference to the TPOP operation. 

We have defined the basic stack operations that we shall consider. Notice that 
a PUSH or POP operation causes the stack size to increment or decrement by at 
most one. It is also possible to execute the composition of a fixed sequence of 
operations; e.g., to push a sequence of k symbols onto the stack. We might extend 
this notion and consider the execution of a conditional sequence of operations in 
which the operation to be executed next (if any) depends on the answer sequence 
returned by the operations performed so far. For instance, there might be an 
operation to clear the stack ; i.e., POP until stack is empty. 

We shall in the rest of the chapter consider several stack representations and 
see how efficiently it is possible to perform the basic stack operations. Recall that 
the operation definitions we have presented describe behavior in the problem 
domain; for a particular representation, the operation behavior in the machine 
domain might or might not resemble the problem domain behavior. 
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6.2 The TOS Endmarker Representation 

Consider using an endmarker representation to implement a stack and the 
PUSH, POP, TOP operations. The following example illustrates one possible such 
implementation. 

to 

Example 6.1. Let X = {a,b}, 8 = {0,1,2}, and ID = UxK Let the function 

i = 

f:A'U{#} -» 8* be defined by 

f(a) =0 
f(b) =1 
fU) = 2 = 

Define the concatenation-preserving endmarker representation p-\D -» 8 by 

Ml 
p{d) = U{f(c/(i))},_ 1 u{o} M1 . 

In this representation, one symbol from 8, namely 2, is reserved to tell us when we 

have reached the top of the stack. For instance, 

/>U) =2 
/j(abaa) =01002 
/j(baabba) =1001102 

If we view each d * ID as a stack, then we might implement the POP, PUSHx, and 

TOP operations by first reading the stack representation from left to right until we 

detect the end-of-stack marker 2. For a POP operation, we then back up and put 

in the previous cell. Assuming L is unbounded, this corresponds to the following 

algorithm. 

a P0P : l - 

while m{\) * 2 do i «- i + 1 
if i = then return "Error" 
else m(i - 1) «- 2 

For PUSHx and TOP operations, we similarly read until we detect the end of stack 

marker 2, and we can then immediately perform the desired operation. 
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a pusHx : [ *- ° 



while w(i) * 2 do i «- i + 1 
m[i) <-f(x) 
»n(i+D <- 2 



a T0P : i - 



while m[i) * 2 do i «- i + 1 
if i = then return "Error" 

else if w(i-l) = then return "a" 
else return "b" 



These algorithms give us the following access costs when no Error conditions are 
encountered: 

#ca por (/>U))] =lc/l + 2 

#ca PUSHx (/j(rf))] =l«/l + 2 

#ca T0P (/»(t/))] =le/| + 2 

We could improve slightly the access cost for tt TOp by remembering the previous 
cell value in some location called "temp", as we make our left to right reading of 
the stack representation. 

a T0P : i - 

while m{\) * 2 do temp *- m{i) 

i <- i + 1 
if i = then return "Error" 

t'lse if temp = ihtn return "a" 
else return "b" 

Tins moriifcu algorithm gives us a memory cell access cost of 

#r_a T0P ( />(</))] = lc/1 + 1 

Although temp can be viewed as requiring additional cells, we choose to let temp be 

part of our processor state, and so we do not include it in the memory access cost. I 

A representation such as p in Example 6.1 is a natural one to use if we choose 
to implement a stack with an endmarker representation. We assume the bottom of 

J. 

the stack is at some fixed (known) location, and we reserve some string * 8 to 
denote the top of the stack. We shall also require that a TOS endmarker 
representation have fixed position fields and that D(0) Q U D(f(x)); the 
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reasons for these assumptions will be made clear shortly. We now make the 
following definition. 

CO 

Definition. Let D = \Jx [ and consider a function fOi'U{;zf} -> 3 + . Let 

i = 
p:\D -* 8 be any fixed position field endmarker representation 

lefl 

M = U[f(dU))} n u{o} 

1=1 i |d|+l 

where n, t IN, for any i * IN + , and f( #) = 0. If D{ 0) Q U fl(f(x)), then 

xtX 
we refer to p as a top of stack (TOS) endmarker representation. 

Clearly the representation p in Example 6.1 is a TOS endmarker representation. We 
use the term TOS because the endmarker is always situated in the set of cells 
which the stack element d(\d\+l) would occupy, if there were one. In other words, 
is in the field at the top of the stack. The representation is easiest to visualize 
when n j + 1 > n; and each field consists of contiguous memory cells. Notice also that 
it is not necessary that each field have size one. The following example illustrates 
another TOS endmarker representation and shows that we need not restrict 
ourselves to the case where \8\ > \X\ + 1. 

OO 

Example 6.2. Let X = {a,b}, # = {0,1}, and tD = Ux 1 . Define the function 

i = 

f:XU{j*} ->S*by 

f(a) =00 
f(b) =01 
f(0) =1 =0 

Then we can define the TOS endmarker representation p-\D -* 8 by 



P {d) v U{fU(i))} 2(M) u{o} 2ld| . 

For instance, we have 



(l-l) u * v '2|d|- 

1=1 



^(abaa) =000100001 
/j(baabba) =0100000101001 

Similar to what we did in Example 6.1, we can implement the POP, PUSHx, and 
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TOP operations by first reading p(d) from left to right until we detect the end of 
stack marker 0. However, since |f(a)| = lf(b)| = 2, we can locate by reading only 
cells 0, 2, 4, ... , until we detect a 1. We then perform the desired operation in a 
straightforward way. Thus, assuming L is unbounded, we might use the following 
algorithms. 



■&"■ 



tt pop : i <- 



while m{i) / 1 do i «- l + 2 
if i = then return "Error" 
else w(i-2) *- 1 



fi PUSHx : 


i ^0 




while m{i) * 1 do i <- i + 2 




m{i) «- 




if x = a then m(i+l) «- 




if x = b then w(i + l) «- 1 




m(i+2) «- 1 


'*TOP : 


i*-0 



while m{i) / 1 do i <- i + 2 
if i = then return "Error" 

else if wi(i-l) = then return "a" 
else return "b" 

I hese produce the following access costs, when no Error conditions are encountered: 

flCGlpopMc/))] = lrfl + 2 

# [a pusHxM</))3 = lc/| + 3 
#Ctt Top (/>(c/))] = It/I + 2 1 

In both examples 6.1 and 6.2 we found that the access costs for the stack 
operations POP, PUSHx, and TOP grow with It/ 1. This leads us to wonder whether 
it is ever possible to perform the operations with fewer accesses. We shall prove 
that the answer is no. In particular, whenever a TOS endmarker representation is 
used we show that for each d $ ID it must be the case that 

#[a pop ( />(«/))] > |rf| + l 
^pushx^))] >!</! + 2 
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r J4^ "I + 2 for \d\ > 



i 



for |c/| = 



To aid us in proving these results we prove three lemmas. The first says that 
when It/ 1 = any algorithm for a POP, PUSHx, or TOP operation will access the 
tij field, which is also the ni d | +1 field. 

oo 

Lemma 6.1. Let ID = Ux' and let d Q « ID, \d Q \ = 0. Consider a function 

1=0 

f:XU{#} -* # + . Let p:\D -> # + be any TOS endmarker representation 

Idl 

P {j) = U{f(c/d))} n u{o} nii . 
1=1. 1 l«»l*l 

Then any implementation of a POP, a PUSHx, or a TOP operation on data 
base d Q <■ ID must access some cell in the n 5 = rii^j field. 

Proof: If \d \ = 0, then p{d ) = {0) n . Thus, if a stack operation is performed 
without accessing the iij field, then no cells in p{d Q ) were accessed at all. Even if 
we accessed every one of the (infinite number of) other memory cells, we would get 
no information concerning whether or not \d\ = 0. Effectively, this says that we 
were able to perform the operation with no accesses, an impossibility. I 

Lemma 6.2 guarantees that performing a stack operation on any d ( ID causes the 
n j field to be accessed. 

oo 

Lemma 6.2. Let ID = Ux l and consider a function f:XU{0} -> 8 . Let 

1 = 

J. 

p:\D -» 8 be anv TOS endmarker representation 

Idl 

,(</) = U{fU(i))} „ u{o} . 

1=1 i |d|+l 

Then for all d ( D any implementation of a POP, a PUSHx, or a TOP 
operation must access some cell in the field n r 
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Proof: Let d Q be some stack, d Q * ID. As a consequence of Lemma 6.1, the result 
of this lemma clearly holds for \d \ = 0. So consider the case where lc/ l > 0, and 
let m be a memory state which contains the representation of c/ ; m 2 p{d ). 
Assume there is an algorithm fl- op for the stack operation OP (one of POP, 
PUSHx, TOP) such that {L(l Qp {p{d ))l} does not contain any cells in the n, field. 
Let Mj be a memory state which differs from m Q only in the contents of field n ,: 

w, = {(n,»n (n))ln * Z)( field n t )} U {0} n . 
So m l represents the empty stack d v \d % \ = 0. Since ft op docs not access the n t 
field when applied to memory state m Q) it also does not access the field n : for the 
memory state m v Thus, (k Qp performs the same operation in either case. Let us 
now look separately at the three stack operations. 
(i) Consider the operation POP, and notice that 

u POP (c/ 1 ) = Error 

whereas u pop^o^ * E rror - 

Thus tt-p p cannot always operate correctly without accessing the rij field. 

(ii) Similarly, d 10P cannot always give the right answer without accessing the field 

n j, because 

qjOpUj) = Error 
whereas ^top^o^ * ^ n 'or. 

(iii) eA- PUSHx will write a in field n 2 if and only if the current memory state 
contains a representation of the empty stack. 

Thus, for all d £ ID, an algorithm which implements a POP, a PUSHx, or a TOP 
operation will access field n y I 

It is also necessary that the endmarker field be accessed, as the following lemma 
shows. 
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Lemma 6.3. Let D= Ua' 1 and consider a function f:XU{jzf} -* # + . Let 

1 = 

p-\D -* 8 be any TOS endmarker representation 

\d\ 

p(d) =U{f(c/(i))} n u{o) . 

Then for all d * D any implementation of a POP, a PUSHx, or a TOP 
operation must access the n, d , + 1 field. 

Proof: For a PUSHx operation, 

M 

M = U{fU(i))}u{o} n 

' 1=1 M+i 

Idl 

and /'("pushx^)) = U{f(rfCi))} U {f(x)} n U {0} n 

1=1 |d|+l n |d]+2 

and so field ii| d | +1 must be not only accessed but rewritten. 

The rest of the proof is similar to that of Lemma 6.2. Lemma 6.1 shows that 
this lemma holds for any d Q € ID such \J Q \ = 0. So consider the case \J \ > 0, and 
let tn be a memory state such that m 2 p{d Q ). Assume there is an algorithm (l Qy 
for the stack operation OP such that performing (l QP (p{d )) does not cause any 
cell in the n (d | +1 field to be accessed. Choose k i IN such that the n k and the n k+1 
fields are not accessed (e.g., choose k > |c/ l + 1). Now define a memory state »i, 
that differs from m only in fields n (dl+1 , n k , and n k+1 : 

»'i = {(n,»i (n))| n £ D( field n k ), n $ D{ field n k+1 ), n $ D( field n |d , + 1 )} 

«W-,H, B /W«,l^W Hlll 

where Xj is any element in X and x 2 t X such that x 2 * d (\d \). Pick c/, < ID 
such that p(d : ) c ^ r Since a op accesses neither the ri| d | + ,, the n k field, nor the 
n k+1 field, a op is not a correct algorithm for either of the stack operations POP or 
1 OP, because no such implementation can perform correctly for both d Q and d v 
(This same argument also would include the PUSHx operation.) Thus, any 
algorithm tt op must access field n, d , +1 . I 

We can now prove our lower bound results for the number of memory cell accesses 
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required to perform any POP or PUSHx operation using a TOS endmarker 
representation. 

Theorem 6.1. Let ID = Ua' 1 and consider a function f:A'U{#} -> S f - Let 

i=0 

p-\D -> 8 be any TOS endmarker representation 

Idl 

/>(</) «U{fU(i))} „ u{«} . 

i=l 1 |d|+l 

Then for all d £ ID any implementation of a POP operation requires at least 

\d\ + 1 memory cell accesses, and any implementation of PUSHx requires 
It/I + 2 accesses; i.e., for all d £ ID, 

#[0. pop ( />U))] >k/l + l 

#[a PUSHx (/>U))] > IJI + 2 

Proof: Any implementation of a PUSHx or a POP operation using /> will result in: 

Idl 

Mu PUSHx (c/)).U{f(c/(i))} nj U{f(x)} n|di+i u{o} Vi+2 

/>(u pop (c/)) = U{f(c/(i))} n U{0} 

1 = 1 i Id] 

Assume there is some algorithm (k Qp , for POP or PUSHx, for which there is some 
P, 1 < p < \d\, such that no cell in n p is in {Ul Q? {p{d ))l}, for d Q * ID. Let m Q 
be a memory state such that m Q 2 p(d Q ), and define a memory state m, that 
differs from m only in field n = 

m. = {(n,w (n)) I n * D( field nj} U {0} n . 

Choose d. « ID such that p{d .) Q m .. Since D( 0) £ U D(f(x)), the endmarker 

x*X 
is located entirely in the n field and so Ck Qp does not distinguish p(d : ) from 

/?(c/). Performing a PUSHx or a POP operation on c/j would give: 

p-i 

/>< u pushxK>> = U{f(«/(i))} n U{f(x)} n U{0} n 

i=l i p p +1 

MupopU,)) = U{f(t/(i))}„ u{o} n 

i = l i P" 1 

Thus, we must be able to distinguish Ic/jl from \d\ in order for a PUSHx or POP 
operation to necessarily be performed correctly. Since the argument holds for any 
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p, 1 < P < lc/1, we need to access at least \d\ cells. By Lemma 6.3, it is also 
necessary to detect the endmarker, leading to one additional access and a lower 
bound of |t/| + 1 for both POP and PUSHx operations. Notice that for a PUSHx 
operation, it is, in addition, necessary to write in the n| d | +2 field, which gives the 
lc/| + 2 lower bound for the PUSHx operation. I 

Whenever f achieves Kraft storage, then D(0) £ U D(f(x)), and so we have the 
following corollary. 

CO 

Corollary 6.1.1. Let ID = Ux' and consider a function f:XU{/*} -> # + . Let 

i = 

p:\D -+ 8 be any TOS endmarker representation 

Ml 

/>(«/)= UfU(i»} n U{0} • 

1=1 i ld| + l 

If f achieves Kraft storage, then for all d * ID any implementation of a POP 
operation requires at least It/ 1 + 1 memory cell accesses, and any 
implementation of PUSHx requires \d\ + 2 accesses; i.e., 

#W P0P ( /»((/))] >l«/l + i 
^a PUSHx ( /»(</))] >ic/l + 2 

We have chosen to require that a TOS endmarker representation have fixed 
position fields and that D(0) £ U D(f(x)), because these seem to be natural 
requirements that are met in most implementations. As Example 6.3 illustrates, 
however, if we were to eliminate the condition that the fields be in fixed positions, 
then we might sometimes be able to achieve lower access costs than were specified 
by Theorem 6.1. 



Example 6.3. Let X = {a,b}, 8 = {0,1}, and ID = Ux 1 . Consider the storage 
optimal function f:Xu{jzf} -> 8* defined by 



1=0 



f(a) =0 
f(b) =10 
f(0) =11. 
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Construct from f the concatenation-preserving representation p-\D -> 8* defined by 



P {d) = U{fU(0)} ni u{ii} n( 

where 



d) 



and 



Then we have, for instance: 



i-i 
n^d) = 2lf(«/(j))l 

\d\ 

n(c/) = Zlf(rf(j))|. 



/j(aaaaaa) = 00000011 
/j(bbbbbb) -10101010101011 
/j(aabaab) = 0010001011 

Notice that the leftmost occurrence of 11 indicates the end of the stack. It is not 

necessary, however, to read every element in the stack representation. For instance, 

when m{\) = and //i(i + 2) = 0, then there is no need to read w(i+l). Thus, we 

could implement POP and PUSH as follows. 

<A.p p : i *~ 1 

loop: while w(i) * 1 do i <- i + 2 
if w(i-l) = then i ^ i + 1 
goto loop 
if i - 1 then return "Error" 
else wi(i-2) <- 1 

^PUSHa : ' 

loop: while m{i) / 1 do i <- i + 2 
if //i(i-l) = then i «- i + 1 
goto loop 
w(i-l) «-0 
w(i+l) *- 1 

(k ■■ i <- 1 

loop: while m(i) * 1 do l «- i + 2 
if w(i-D = then i «- i + 1 
goto loop 
m(i) *- 
w(i+l) <- 1 
w(i+2) «-l 



- 1S6 - 

Using these algorithms to perform a POP or a PUSHx operation on p(a n ) or on 
p{b n ) , we only make -y^ + k; accesses, for some constant kj « IN. So p is a 
concatenation-preserving endmarker representation for which it is not always 
necessary to make \d\ accesses. Note, however, that for J = {ab} n these algorithms 
lead us to access every cell in D(/j({ab} n )), a total of 4 • |t/l + k 2 accesses. I 

Although in the above example we were sometimes able to perform a POP or a 
PUSHx operation in only W- accesses, we at other times were forced to make -y- 
accesses. Thus, it seems likely that there would still be an average cost of lc/1 
accesses, even though the worst case cost has been improved. If we were to 
eliminate the requirement that D(0) .c U D{i{\)) , then we would lose storage 
optimally but would be able to achieve lower access costs, as Example 6.4 shows. 

Example 6.4. Let X = {a,b}, 8 = {0,1,2}, and define the non storage optimal 
function f=A'U{0} -> 8* by 

f(a) =0 
f(b) =1 
i(0) = 22 

Let p:\D -> 8* be the concatenation-preserving endmarker representation, with fixed 

position fields, defined by 

Idl 

P (d) = U{f(rf(i))},., U{22} M 
1=1 
For instance, 

/>(abaab) = 0100122 

/j(bbab) = 110122 

/5(a) =022 

Possible algorithms to implement POP and PUSHx operations are as follows: 
ft pop : l <- 1 

while m{\) * 2 do i «- i + 2 
if m(i-l) * 2 then m(i-l) «- 2 

else if i = 1 then return "Error" 
else w(i-2) «- 2 
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while w(i) * 2 do i *- i + 2 
ifwi(i-l) * 2 then w(i) <- f(x) 

w(i+2) «- 2 
else w(i-l) *- f(x) 

w(i+l) *-2 

For all J 4 ID, these algorithms have the following access costs: 

um ?0? { P {d))i = M + kj 
tf«T USHx M</))] = x + k 2» 

for kj, k 2 ( IM. I 

Theorem G.l made no mention of the TOP operation; in fact, the lt/| + 1 result 
does not necessarily hold for every d ID. We can see this by reconsidering 
Example G.l, which we do in the following example. 

Example? 6.5. Recall the representation p from Example 6.1. We presented there an 
algorithm f* T0P which required It/ 1 + 1 accesses, for all d ( ID. We now show that 
we can sometimes do better than \d\ + 1. For instance, consider 

/?(abbaaba) = 01100102. 
From Theorem 6.1, we know that any algorithms for ^ P0P and ^ rusH> . will access 
at least |</| + 1 memory cells, for all d € ID. Let us construct an algorithm for (* T0P . 
Suppose our algorithm first accesses cell 7. Since mil) = 2, cell 7 must contain the 
endmarker, if cell 7 is part of pid). By reading w(6), we know that q T0P = a if 
7 £ Dipid)). Of course, if \d\ < 1 then it is possible that q Top = b. In order to 
verify that q Top = a we need only access cells w(0), mil), w(3), w(S), w(6). In 
particular, we don't need to access mil) or m(4), because we already know that 
w(0) = w(3) = 0. So upon locating the occurrence of the endmarker in cell 7, we 



conjecture that q TOp = a. If mil) = 2 or w(4) = 2 then we still have q T 



op 



= a. 



Thus, we have an example where it is possible to sometimes determine q T0P in 
fewer than \d\ + 1 accesses. Notice, however, that an algorithm such as we 
presented here would for some d require more than lc/| + 1 accesses; in particular, 
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if the m(l) we originally accessed were not in our representation. I 

In determining- q TOp (t/), the trick used in Example 6.5 could allow us to access, for 
some d * ID, as few as r ^L" 1 + 2 memory cells. In other words, we would 
always access the endmarker field and the field corresponding to the top stack 
element. At best we would only have to access half of the remaining It/I - 1 cells. 
The following theorem shows that it is never possible to do better. 

CO 

Theorem 6.2. Let ID = Ih' 1 and consider a function f:XU{0} -> S + . Let 

1 = 

p-\D -* 8 be any TOS endmarker representation 

Idl 

p{d) =U{T(rf(i))} n .U{0} • 

i=i i ldl + i 

Then for all d « ID such that \d\ > 1 and for any implementation, 0- T0P , of a 
TOP operation: 

fiia rop (p(d))i >r^i-i + 2. 

Proof: By Lemma 6.3, we know that the n| d | +) field must always be accessed. 
Also, it is necessary to access the ii| d | field, since this is the value we want to 
determine. So the result clearly holds for It/I = 1 and, by also using Lemma 6.2, for 
\d\ = 2. Consider the case where \d\ > 2. We know that we must access the fields 
n, d , + 1 and n, d |. Now assume we have an algorithm for f T0F , ^ T0P , that for some 
d Q * ID returns the value q TOP (c/ ) = Xj for some Xj * X and for which there 
exists k (■ IN , 1 < k < \d\ - 1, such that a TOp accesses neither the n k nor the n k+1 
field. Let m Q 2 p{d ). Define a new memory state m x such that m x differs from 
m only in the n ]t field, which contains f(x 2 ) (for x^x,), and in the n k+1 
field, which contains 0. Then m l 2 p{d j) , where 

*c/(i) for i < k 



</,(i) = 



x 2 for l = k 



Then, using the algorithm tt TOp , we must get «. TOp ( p{ d x ) ) = x r But we know 
that fro^c/i) = x 2 . This results in a contradiction, which means that for any 
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valid algorithm 0- J0F it is never possible to not access two consecutive fields n k and 
ti k+1 , for 1 < k < \J\ - 1. Since by Lemma 6.2 we must always access the iij field, 
this says that we must make at least F — * — 1 + 2 accesses. I 



From Theorem 5.10 we immediately know that a TOS endmarker 
representation achieves Kraft storage when the function f does. 

oo 

Theorem 6.2. Let ID = Ux' and consider the function f:XU{0} -> # + . If 

1 = 

the function f achieves Kraft storage, then the TOS endmarker representation 

p:\D -» 8 f , defined by 

Idl 

M = UifidU))) „ u{o} , 

1=1 I |d| + l 

also achieves Kraft storage. 

Before we conclude this section, let us say something about finite memories, 
L < <». In our definition of a TOS endmarker representation we, for simplicity, 
considered infinite domains and assumed that we would never run out of memory 
space. Allowing L to be finite would not have changed our results, except perhaps 
when \p{d)\ = L, although our algorithms would, of course, have to be modified. 
Also, recall from Section S.3 that an endmarker representation cannot achieve Kraft 
storage for finite L. If we had wanted to allow finite L we perhaps would have 
chosen to extend the definition of a TOS endmarker representation as in the 
following example. 

Example 6.6. Recall Example 6.1, where X = {a,b}, 8 = {0,1,2}, and the function 

f:A'U{#} -» 8* is defined by 

f(a) =0 
f(b) =1 
f(0) =2. 

4 

Assume, however, that L =4 and that D = [J X . We could define a 

i = 

representation p-\D -» 8* by 
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p(d) = 



\d\ 



U{f(€/(i))},_i U{0} |d| for lc/1 < 3 



i=i 



Then 



Idl 

U{f(«/(0)} M 



for lc/1 - 4 



i=l 



M =2___ 
/>(abb) =0112 
Mbabb) = 1011 

Notice that, using this definition, every possible memory state is a representation of 

some stack and p achieves Kraft storage. The stack operations can be implemented 

essentially as they were in Example 6.1, but we have to watch for lc/1 = L. 



a 



POP* 



i ♦- 

while m(i) * 2 do if i = L - 1 then m{i) «- 2 

return 
else i «- i + 1 
if i = then return "Error" 
w(i-l) -"2 



a 



PUSHx' 



i<-0 

while m(i) * 2 do if i = L - 1 then return "Error" 

else i «- i + 1 

m(i) *-f(x) 

if i * L - 1 then wi(i+l) <- 2 



^top : 



i<-0 

while m{i) * 2 do if l = L - 1 then temp «- ;n(i) 

goto decode 
else temp *- w(i) 
i «- i + 1 
decode: if i = then return "Error" 

else if temp = then return "a" 
else return "b" 



'hese algorithms give the following access costs: 

fl if lc/1 = 

UUl po? (p(d))l = 1 lc/1 + 2 ifO <\d\<L 

\J\ if lc/1 = L 
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*[a PUSHx ( ,>(</))] 



uta TQP (p(j))i 



\d\ + 2 if \d\ + 1 < L 

Irfl + 1 if Irfl + 1 = L 

It/I if lc/1 = L 



minfltfl + 2,L} 
"|t/| + 1 if l</l < L 

Jc/I if lc/1 = L 

min{|J| + 1, L} 



I 



Thus, we certainly could have considered finite memory spaces, but the extra 
complication in our algorithms would not have increased our understanding of 'I OS 
end marker representations. Similarly, in the next section we always make the 
assumption that L is infinite. In Section 6.4, where we discuss pointer 
representations for stacks, we shall consider both finite and infinite L. 

In this section we have examined perhaps the most obvious stack endmarker 
representation scheme, the TOS endmarker representation. We know as a 
consequence of Theorem 6.2 that it is possible for such a representation to achieve 
Kraft storage, but we have also shown that any implementation will result in 
expensive access costs for every d < ID. In particular, 

UUk ?Q? { p{J))1 > Ul + l 
#Ca pusHx ( />((/))] > \d\ + 2 

//ca T0P ( />((/))] > rM.zli + 2, 

for any algorithms U- P0P , ^ PU sh>:' ^top irnp lemellCm §' tbe stack operations POP, 
PUSHx, and TOP. This leads us to wonder whether some other type of endmarker 
representation could result in cheaper access costs. The POP and PUSHx operations 
involve updating the memory contents, but the TOP operation is just a question. 
Suppose we were to keep the top of the stack at some fixed location. Such a 
representation scheme is discussed in the next section. 
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6.3 The BOS Endmarker Representation 



Consider an endmarker representation of a stack in which the top of the stack 
is always at a fixed (known) location and the bottom of the stack is allowed to 
vary. In this case, the endmarker denotes the bottom of the stack. The following- 
example illustrates one possible such implementation. 



Example 6.7. Consider the function f from Example 6.1, where we have 

CO 

X = {a,b}, 8 = {0,1,2}, ID = LJX 1 , and where we define the function XU{#} -* 6* 

i = 

by 

f(a). =0 
f(b) =1 
fist) =2 = 

4. 

Define the concatenation-preserving endmarker representation p-\D -» 8 by 

Idl 

p(d) =U{f(c/(«))} n(d) U{0} n(d) . 
i=l ' 

where n,U) =2(lc/l - 

and n(c/)=2lc/l 

In this representation, the endmarker indicates when we have reached the bottom 

of the stack. Reading the memory contents "from left to right" corresponds to 

reading the elements in the stack from the top down. For instance, 

p{7,) =2 
/>(ab.aa) =00102 
/j(baabba) =0110012 

It is certainly easy to perform a TOP operation, since we need only read w(0). 

a TOp : ifm(0)=2 then return "Error" 

else if wi(0) = then return "a" 
else return "b" 

On the other hand, consider performing a PUSHb operation on d = ababa: 

p{d) = pi ababa) =010102 

/>(upusHb< rf >) =/>(ababab) =1010102 
Notice that it will certainly be necessary to access \d\ + 2 cells, since this many cells 
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are actually rewritten. Intuitively, we want to set m{0) *- 1 and to shift the contents 
of each cell in p{d) right by one. Recalling the notation introduced in Chapter 3, 
one implementation scheme would have the access sequence 

0,1,2,... It/1 -1 , ]£, \d\ + 1. 
One possible algorithm is the following. 

a pusHx : ' *" ° 

tempi <- f(x) 

while w(i) * 2 do tempi s w(i) 

i <- i + 1 
w(i) <- tempi 
?7i(i+l) «-2 

Notice that we have made use of the additional register tempi, as we did in 

Example 6.1. Recall also that in Chapter 3 we defined a single access to consist of 

reading' and then possibly rewriting a cell. Thus, we have written 

tempi s m[i) 

to indicate a single access to m(i), where the old contents of m[i) is stored in tempi 

and the old contents of tempi is stored in m(i). We refer to this as an exchange, 

and might have written it out using a second temporary location, temp2: 

temp2 *- w(i) 
m{i) *- tempi 
tempi <- temp2 

Now consider performing a POP operation on d = ababab: 

p(d) = /^(ababab) = 1010102 

p{u vo? {d)) = /»(ababa) =010102 

As for PUSHx, a POP operation will have to rewrite Id I cells and so at least Idl 

accesses will be required. In this case, we intuitively want to shift the contents of 

all of the cells in p{d) left by one. One scheme for doing this would have the 

access sequence 

0, 1, 2, . . . ,|d|-l, Idl, \d\A, idb2, ... ,2, 1, 

and could be implemented using the exchange operation described above. 
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ft p0 p : if wi(i) = return "Error" 

i <-l 

while m(i) / 2 do j «- i + 1 
tempi *- 2 
while i > do i «- i - 1 

m{i) a tempi I 

We refer to a representation such as p in Example 6.7 as a BOS endmarker 
reprcscntation, because the endmarker is always situated in the field following 
that field which contains the bottom stack element; i.e., in the set of cells which the 
bottom stack element would occupy if the stack had another element in it. 



Definition. Let ID = \Jx i and consider a function f:XU{&} -» 8 f . Let 

i=0 
p-\D -» 8 be any endmarker representation 

P(d) = U{f(c/d))} u{o} nii 

where n. < IN, for any i * INI + , and f( 0) = 0. If D{0) c U Z)(f(x)), then 

xtX 
we refer to p as a bottom of stack (BOS) endmarker representation. 

The definition of a BOS endmarker representation is basically the same as that of a 
TOS endmarker representation, except that c/(i) is located in field n^^j., rather 
than in field 1 1 i _ In other words the order of the representations of the stack 
elements is reversed. The representation is easiest to visualize when n 1 + 1 > n, and 
each field consists of contiguous memory cells, but no such requirements are 
imposed by the definition. 

'I he BOS endmarker representation was motivated by an attempt to decrease 
the access cost for performing a top operation. As we shall see, however, we have 
not altered the access cost for PUSHx and we have actually worsened, for all 
d <: ID, the lower bound access cost for POP: 

uia T0? (p(d))i >i 

tfttpiJSHxM'O) 3 > lc/l + 2 

#ca rop ( /»(«/))] > i^li = L i^j + i. 
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Theorem 6.3. Let ID = IJA' 1 and consider a function f:XU{jzf} -* 8 f . Let 

i = 

p:\D -» 8 be any BOS endmarker representation 

Idl 

p(d) =U{f(c/(i))} n| , U{0} n]| . 

i = i n |dKi-i n |d|*i 

Then for all d £ ID any implementation of a PUSHx operation requires at least 
It/ 1 + 2 memory cell accesses; i.e., for all d £ ID, 

^pushxM^H > le/l + 2. 

Proof: Assume there exists some algorithm fi. pusHx for performing a PUSHn 
operation and some d * ID such that #[fi- PUSHx ( ^(«/ ) )] < lc/1 + 2. By the 
definition of a PUSHx operation we know that 

^• i ■5 w ''" ii| vm ui,i v. 

Idl 
/<Upu S HxK» = H^^S^-i U {f(x)} nj U {0} n|c/|+2 . 
Certainly the values in fields ii| d | +1 and n| d | +2 must be accessed. Assume that there 
is some p, 1 < p < |c/|, such that ^ PUSHx ( p(d Q )) does not access field n p . Let m 
be a memory state such that m Q 2 f(d ), and as in the proof of Lemma 6.3, let m, 
be a memory state which is identical to m Q except in the n field, where is stored. 
If Wj 2 p(<J x ), then the algorithm ^ FUSHx does not distinguish d Q and c/j and 
thus ^ PUSMx does not correctly perform a PUSHx operation on d v a contradiction. 
So any algorithm ^ PUSHx must always access the \d\ fields n , , n 2 , . . . , ri| d |, well as 
the fields r. |d|+1 and n |d | +2 . I 

As a consequence of this theorem, we know that the algorithm ^ PU shx in Example 
6.7 is optimal; in fact, we know that for no d * D is it possible to make fewer than 
k/ 1 + 2 accesses. 

Let us now consider the construction of an algorithm for the POP operation. 
Using the scheme presented in Example 6.7, we could read the n, fields essentially 
from left to right until we reach the bottom-of-stack endmarker, and then shift the 
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elements in the representation "left one field". This corresponds to a field access 
sequence 

1,2,... , lrfl-1, Irfl, lt/l+1, ti\, |rfhl, ... ,2, 1. 
If we choose not to read all the way to the endmarker and then backtrack, we could 
use an algorithm with a field access sequence 

1,2, 1, 3, 2, 4,3, .... It/I, \dH, lt/l+1, Irfl. 
Either of these algorithms would, however, require making a total of 2-lt/l + 1 
accesses, and we shall show that it is possible to (always) do better. In order to 
motivate the lower bound we shall obtain for the POP operation, we indicate how 
the algorithm tt pop in Example 6.7 could be improved. 

Example 6.8. Recall the representation p from Example 6.7 and consider 
performing a POP operation on c/ Q = abaa. We know that 

p{J ) =00102 
and P(u ?op (p(d )) =0102. 

Recall that our definition of access allows us to read and then, if we choose, rewrite 
a cell. So suppose we first access cell 1. Since m(l) = 0, we put a into cell 0, 
checking, of course, that cell is not the end of the stack. We then read cell 3. 
Since wi(3) = * 2, we go back to cell 2, which we now read. Since m(2) * 0, we 
write a into cell 2. We already know that m(l) a 1, and so we set w(D *- '"(2). 
At this point we have (correctly) rewritten w(0), w(l), m(2). We now read cell S. 
For the case we are considering, m{S) is not included in p{d ), so cell 5 might or 
might not contain the endmarker 2. In either case, we back up and read cell 4, at 
which time we find that w(4) = 2. Having already read cells 0, 1, 2, 3, we now 
know that cell 4 contains the BOS endmarker. So we set w(3) «- 2 and are done. 
Using this procedure we have the memory cell access sequence 

1, P_, 3, 2, 1, S, 4, 3, 7, 6, 5, 9, 8, 7, 11, 10,9,... 
We might write the algorithm out as follows, making use of two temporary 
locations, tempi and temp2. 
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a 



POP" 



tempi <- m{l) 

if m{0) = 2 then return "Error" 
else m{0) <- tempi 

if tempi = 2 then return 
i *-3 
while m(i) * 2 do tempi «- w(i) 

temp2 «- m(i-l) 

w(i-2) <- temp2 

if temp2 = 2 then return 

else tfi(i-l) «- tempi 



i «- i + 2 



temp2 <- w(i-l) 
w(i-2) «- temp2 
if temp2 = 2 then return 

else w(i-l) *- 2 



This algorithm results in an access cost of 

uia P0F (p(j))i = . 



3-if + 2 



for lc/1 even 



for lc/1 odd. 
We shall shortly prove that the algorithm is, in fact, optimal. 



3-^1 + 2 



I 



In order to derive a lower bound access cost result for performing a POP 
operation we begin by proving two lemmas. Recall that our definition of access 
allows us to read, although certainly not rewrite, a memory cell which is being used 
by another user. 

Lemma 6.4. Let ID - Ua' 1 and consider a function f:A'U{#} -» 8 . Let 



i=0 



«t 



p-\D -* 8 be any BOS endmarker representation 

\d\ 

M = U{f(rf(i))} nil u{o}_ , . 

i=i n idi+i-i n idi+i 

Then a cell in field n ( , i > 1, cannot be rewritten unless each of the fields 
n ,,..., n^j has been accessed. 
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Proof: A cell in field n, cannot be rewritten unless it is known that lc/| > i - 1 ; i.e., 
we are not allowed to rewrite field n t if it is in some other user's memory space. 
Thus, in order to rewrite a cell in field n i , it must be the case that no n., for 
1 < j < i, contains the endmarker 0. There is no way to guarantee this without 
accessing each of the fields tij, n 2 , . . . , n^j. I 

The following lemma essentially tells us that field n, cannot be rewritten until field 
n i+1 has been accessed. 

oo 

Lemma 6.5. Let ID = Ux 1 and consider a function f:.YU{jzf} -* 8 f . Let 

1 = 

+ 
f-\D -* 8 be any BOS endmarker representation 

Ml 

M = U{fU(,))} u{o} ni| . 

i=i n M|+i-i n |dKi 

Consider any algorithm, fl. pop , for the operation POP. Then there must be 

an access to field n [+1 made previous to the last rewrite of field n,, for 
1 < i < Irfl. 

Proof- Recalling the definitions of the BOS endmarker and the POP operation, 

Ml 

fid) =U{fU(,))} ni| u{o} ni| 

1 = 1 n MI+i-i n |d|+l 

Ml-l 
and />(u P0P (c/)) = U{f(c/(i))} n U{0}„ . 

So f(c/(i)) gets moved from field ri| d | +1 _, to field " tc/J-i' ^ ince we can deteim ' ne tlie 
contents of field n id | +w only by making at least one access to that field, field 
n |d|+i-i must be lcacl before its value can be put into field riy.,. I 

For any algorithm ft pop we can consider its corresponding field access sequence. 
We prove our lower bound result by lower bounding the size of a sequence which 
meets the conditions presented in lemmas 6.4 and 6.5. We first make the following 
definition. 
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Definition. For k, i (• IN + , define a set S k , as follows: 

S k , a (k, k+1, . . . , k+i-1, k+i, k, k+1, . . . , k+i-1 ). 
We say that a sequence is an s(k,i)-sequence if it contains each of the terms 
!i? k+1, . . . , k+i-1 , k+i, if each term in the sequence is in S k ,, and if the 
following conditions are satisfied: 
(i) For all r, k < r < k + 1, the last occurrence of r is preceded by r + 1 or 

r + 1. 
(ii) For all r, k < r < k + l, the last occurrence of r is preceded by J or j, 

for every element j * {k, k+1, . . . , r-2, r-1}. 

We define cr(k,i) to be an s(k,i) -sequence of minimal length, so that 

|a(k,i)l h min |s(k,i)|. 
s(k,i) 

Since k(0,|t/l)l is minimal over all sequences s(0,k/i), <r(0,lc/|) corresponds to an 
optimal access order for performing a POP operation. 

Lemma 6.6. Let ID = Ua' 1 and consider a function i-Xl){0} -* 8 . Let 

1 = 

p-\D -» 8 be any BOS endmarker representation 

Icfl 

f(d) = U{fU(i))} U{0} . 

i=i n M+i-i n |d|+i 

Then for any algorithm, ft- PO p, which implements the operation POP, and for 
all d <- ID: 

ttia P0P (p(d))i >i(r(o,ic/i)i. 

Proof: Recalling the definition of a field access sequence, the proof follows directly 
from lemmas 6.4 and 6.5 and from the definition of a(0,lc/|). I 

Now that we have established the correspondence between a sequence a(0,lc/l) 
and ttl(L pop (p{d))2, we have the notation with which we prove our lower bound 
result. We prove this as a consequence of three lemmas. 
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Lemma 6.7. For k < IM , k > (a) |<r(k,0)| = l 

(b) k(k,l)| = 2 

Proof: (a) We want the minimal length of a sequence <r(k,0) containing k and 
satisfying' conditions (i) and (ii) in the definition of a\ k is such a sequence and 
therefore k(k,0)| = 1. 

(b) We want the minimal length of a sequence <r(V.,\) containing k, k+1, and 
satisfying (i) and (ii); k+1, k is such a sequence and clearly must be minimal. 
Thus, k(k,l)| = 2. » 

Lemma 6.8. For i < IM , i > 0, k(0,i+2)l = 3 + k(0,i) I. 

Proof: cr(0,i+2) is a minimal length sequence containing 0, 1 , . . . , i., i+1 , i+2, and 

<r(2,i+2) is a minimal length sequence containing 2, 3, ..,,[, i+1, i+2 (both 

satisfying the above conditions (i) and (ii)). 

(0 We first show that k(0,i+2)| < 3 + k(0,i)|. 

Suppose we have some minimal length sequence <r(2,i+2). We convert this into a 

sequence a(0,i+2) by considering two cases: 

(a) Assume that 2 is preceded by 2 in the sequence <r(2,i+2). Immediately 
following 2, insert 0, 1, into the sequence. 

(b) Assume that 2 is not preceded by 2 in the sequence. (Then it must be the 
case that 2 is the first field written.) Before 2, insert 1, and after 2 insert 
1. 

Thus, k(0,i+2)| < 3 + k(2,i+2)l = 3 + k(0,i)l. 

(ii) We now show that k(0,i+2)| > 3 + k(0,i)l. 

A minimal sequence <r(0,i+2) must contain 0, 1, and a minimal sequence cr(2,i+2) 

will not contain these. But 1 must appear before 0, and (as well as 2) must 

appear before 1. Therefore, it is necessary to include or 1 in order to have 0, 1. 

Thus, k(0,i+2)| >3 + k(0,i)|. I 



ni 



Using the previous two lemmas, we can compute |<r(0,i)|. 

Lemma 6.9. For i ( IM , i > 0, we have: 

(a) |<r(0,2i)| =3i + 1 

(b) |«r(0,2i+l)| = 3i + 2. 

Proof: From Lemma 6.8, |<r(0,i+2)l =3 + k(0,i)|. Now apply Lemma 6.7. For i 
even, this gives us 

k(0,i+2)l =3i + k(0,0)| =3i + 1, 
and for i odd we have 

l<r(0,i+2)l=3i + |cr(0,l)l=3i + 2. I 



We now apply this discussion of sequences a and recall from Lemma 6.6 the 
correspondence to the POP operation. This now allows us to lower bound the 
number of accesses required to perform a POP operation. 

OO 

Theorem 6.4. Let D = Ux 1 and consider a function f:#U{#} -> 3 + . Let 

1 = 

/3=1D -> 8 be any BOS endmarker representation 

Idl 

P(d) = U{fU(0)} u{o} n|| . 

Let ^ PO p be any implementation of the POP operation. Then for all d S ID: 



#[tt pop ( />(«/)) 3 > 



3 > -S 



L^T* 1 



if \d\ is odd 
if It/I is even 



In other words, 



Ka P0P (p(dm > r^U^ 1 



T 



Theorem 6.5 combines the results of theorem 6.3 and 6.4, along with the trivial 
observation that UUi JQF { p(d))l > 1. 



172 - 



Theorem 6.5. Let ID = Ux 1 and consider a function f-XU{0} -» # + . Let 

1 = 

p-\D -» 8 be any BOS endmarker representation 

Ml 

P {d) -U{f(rf(0)} nii u.{o} . 
i=i n M+i-i 

Let ^ PO p be any implementation of the POP operation, let ft pusHx be any 
implementation of the PUSHx operation, and let (X T0P be any implementation 
of the TOP operation. Then for all d ( ID: 

#Ctt pop (/»(c/))] > r-^ili 

#ca PUSHx (/j(c/))] > k/1 + 2 
#ca T0P ( /»((/))] >i. 

Recalling the algorithm for POP that we presented in Example 6.8, we now 
know that that algorithm is optimal for \d\ odd. Perhaps it would be possible to do 
one access better, however, when \d\ is even. As a consequence of the following 
lemma, it is impossible to simultaneously achieve the bounds of Theorem 6.4 for 
both \d\ odd and \d\ even. 

Lemma 6.1G. Let i be any even natural number. Suppose we have some 
minimal length sequence o (O,i) and some minimal length sequence 
<7j(0,i+l). Then <r ( 0,i) is not a prefix of 0^(0, i+1). 

Proof: The sequence <r Q (0,i) must contain 0, 1 , . . . ,ij4, i_, and <rj(0,i+l) must 
contain 0, 1 , . . . ,ijT) L» ill- Since i is even, l<r ( 0,i ) i =3 • — + 1. Because i + 1 is 
odd and 0^(0,1+1) also has minimal length, k^O^+DI = 3 • — + 2. Thus, 

1^(0,1+1)1 = 1^(0,01 + 1. 
Suppose tr o (0,i) is a prefix of 0^(0, i+1). Since k o (0,i)| is minimal, <r o (0,i) does 
not contain i + 1, and therefore also does not contain i., both of which must be 
present in 0^(0, i+1). So there is no way to append a sequence to <r ( 0,i) in order 
to obtain a minimal length sequence 0^(0,1+1). I 
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Note that the proof of Lemma 6.10 does not hold if i is an odd number, because 
kJO.i+UI =2 + k o (0,i)l fori odd. 

Theorem 6.5 gave lower bounds for implementing the stack operations with a 
BOS endmarker representation. Example 6.7 showed that the bounds for the 
PUSllx and TOP operations are actually achievable. We can also argue, as a 
consequence of Lemma 6.10, that the algorithm a pop from Example 6.8 is access 
optimal. Since a pop has a minimal number of accesses for Id I odd, it cannot 
possibly achieve 3 • -^- + 1 accesses for Idl even. Thus, the best it could possibly do 

Lt 

would be 3 ■ — + 2 accesses for Idl even, which is precisely what it does do. The 
following example shows that we could have constructed an algorithm for the POP 
operation which would have been minimal for Idl even. 

Example 6.9. Reconsider the representation p from examples 6.7 and 6.0. The 
algorithm a pop from Example 6.8 is access optimal. Let & F0? ' be an algorithm for 
the POP operation which has the field access sequence 

0, 2, 1, 0, 4, 3, 2, 6, S, i, 8, 7, 6, . . . 
Note that U- r0P ' is, in fact, realizable, because this is basically the same algorithm 
we had before, only with a different starting sequence. This algorithm has for an 
access cost: 



1 •■!£ + 1 if Idl is even 



UUk'(p(tl))l > < 



3.Ji^I+4 if Idl is odd 



Thus, U P0F ' requires a minimal number of accesses for Idl even and is also access 
optimal. In fact, for Example 6.1, the BOS endmarker representation p with TOP 
and PUSllx implemented as in 6.1 and the POP implemented as in Example 6.8 is a 
storage and access optimal implementation (p, W- P0P , ^pusiix' ^top'* 

As was the case for the TOS endmarker representation, Theorem S.10 
immediately tells us that a BOS endmarker representation achieves Kraft storage 
when the function f does. 
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Theorem 6.6. Let D = IU' 1 and consider the function i'-Xl){0} -» S + . If 



1=0 



the function f achieves Kraft storage, then the BOS endmarker representation 

P-D -» S + , defined by 

1*1 

/>(</) = UfUO))} „ u{»}„ , 

1=1 i 1*1+1 

also achieves Kraft storage. 

So we have constructed the BOS endmarker as an alternative to the TOS 
endmarker representation scheme. We thereby decreased the access cost for 
performing a TOP operation, but in so doing we increased the cost of a POP 
operation. For a summary of the worst case lower bounds, see Table 6.3 at the end 
of the chapter. 
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6.4 The TOS Pointer Representation 

Consider using an endmarker representation to implement a stack and the 
POP, PUSH, and TOP operations. The following example illustrates one possible 
such implementation. 



3 

u 

1 = 



Example 6.1G. Let X = {a,b,c,d}, 8 = {0,1,2,3}, and D = U A' 1 . Let the function 

i-X -* S* be defined by 

f(a) =0 

f(b) =1 

f(c) =2 

f(d) =3 

Define the concatenation-preserving pointer representation p'-\D -♦ 8 by 

Idi 

p(cl) = U{f(c/(i))},U{«(lc/l)} 0> 

i=l 

where the pointer component £:J -» 8 is defined by 

P.(0) =0 

.f.(l) =1 

.C(2) =2 

fl(3) =3 

For instance, 

p{\) =0 

p(A) =13 

/>(cab) =3201 

We assume that L is large enough to represent any d S D; in particular L > 4. In 

order to perform a POP operation in this example we need only decrement the 

pointer. Notice that there is no need to read any stack elements, since decrementing 

the pointer automatically decreases \p{d)\ by one. So we could use the following 

Miuple dlguiiihtii Lu perform a POP upuraliuti. 

a pop : if m(0) = then return "Error" 

w(0) «-w(0) -1 
By our definition of a memory cell access, this algorithm for POP corresponds to a 
single access; we read the contents of cell and then, depending on its contents, we 
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may rewrite the value. If m(0) = 0, then we return an "Error" message and the 

second line in the algorithm never gets executed. For a PUSHx or a 1 OP 

operation, however, we must read the pointer in order to determine where the top 

of the stack is, and we then go to the appropriate stack location to perform the 

operation. 

if ffi(0) =3 then return "Error" 
w(0) «-m(0) +1 



a 



PUSHx' 



w(m(0)) «-f(x) 



a 



TOP' 



if >n(0) = then return "Error" 
return m(m(0)) 

These algorithms give the following access costs: 

#Ca pop (/j(t/))] =1 

~2 



1 



if |c/| * 3 
else 



#[0l TOP ( />(</))] = < 



2 if Ic/t * 



1 else 



Notice that the representation p in the above example allowed us to implement the 

3 

set of stack states D = Ux 1 with low update costs, lower than was possible with 

1 = 

the TOS or BOS endmarker representations. 

We extend the pointer scheme illustrated in Example 6.10 and make the 
following definition. 

k 

+ 

Definition. Let ID = Ux 1 , for k € IM and consider a function f'X -* 8 • Let 



i=0 



p:\D ■* S + be any fixed position field pointer representation 

Idl 

P (d) = U{fU(0)} n u{fi(ltfl)} n 
1=1 ' 

where n, iij * IN (for any < i < k) and where A is a representation 

P.-J -* S + . Then p is a TOS pointer representation. 
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We use the term TOS because reading the pointer component, C(lc/|), tells us lc/1 
and we can then go directly to the field n j rf j in order to determine the top of stack 
element. Clearly the representation p from Example 6.10 is a TOS pointer 

oo 

representation. If liDI is large, and especially if ID = Ux\ then the size of the 

1=0 
pointer will grow large. Therefore, we may sometimes find it convenient to view 

the TOS pointer representation as a separate pointer representation. 

Restricting our consideration to concatenation-preserving representations is 
pei haps an obvious thing to do, but let us discuss why we also require that a I OS 
pointer representation have fixed position fields. The fixed position field 
assumption is included as a consequence of our definition of a pointer 
representation, where we chose to encode \d\ rather that \p{d)\. If we were to allow 
variable position fields, then knowing C{\d\) would not necessarily tell us the 
location of the top of the stack. 

Unfortunately, requiring fixed position fields will, in general, result in "gaps" 
in the representation, unless |f(x.)l = lf(x 2 )| for all x v x z < X. Thus, if we insist 
on Kraft storage, a TOS pointer representation must sometimes have gaps when 
U'l * 1 31'. We could, alternatively, have defined a TOS pointer representation p to 
be a concatenation-preserving representation p-'D t 8* defined by 

p(d) = U{f(c/(i))} n(d) U{«(|/>U)l)} 

1=1 ' 

i-\ 

where n, = U'i\p(d) 1)1 + 2lf(</(j))|. 

Such a definition would avoid the problem of having gaps in the storage of p{d) 
and would not affect the storage and access results we obtain. Thus, our original 
definition of a TOS pointer representation is satisfactory for our purposes. 

In Example 6.10, the domain size was small enough that the pointer 
component was able to fit in a single memory cell. For a larger but bounded 
domain size, we can still store a stack pointer in a fixed number of memory cells, as 
we do in the following example. 
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Example 6.11. Let X = {a,b,c}, 8 = {0,1), and D = Ux 1 . Define the function 

1 = 

f.X •* £* by 

f(a) =0 
f(b) =10 
f(c) =11 

and the pointer component fi:J -» 8* by 

f.(0) =000 fi(4) =100 

,C(1) =001 «(5) =101 

/!(2) =010 fi(6) =110 

f.(3) =011 fi(7) =111. 

Then we can define the TOS pointer representation p-\D -» 8* by 

M 
p(d) = U{f(c/(i))} 2(i . 1)+3 U{C(lc/|)} . 
1=1 
(We assume, of course, that L > 17.) Then we have, for instance, 

p(\) =000 
/j(abc) =0110_1011 
/>(accab) = 1010_11110_10 

Notice that the representation p achieves Kraft storage. 

We can implement the stack operations roughly as follows. For the TOP 

operation, we read the three pointer cells and then go to the top of the stack to look 

up the answer. 

a Top : tempi *- »i(2) + 2- m(l) + 4- w(0) 

if tempi = then return "Error" 
temp2 *- 2- (templ-1) + 3 
if w(temp2) = then return "a" 

else if ?«(temp2+l) = then return "b" 

else return "c" 

In order to do a PUSH* operation, we must increment the three pointer cells as we 

read them, and we then insert the correct item onto the stack. 



m 



a pusHx : if m(2) =0 then mil) «- 1 

goto write 
mil) *-0 

if m(l) - then mil) «- 1 
goto write 
mil) <-0 

if w(0) - then »i(0) «- 1 
goto write 
mil) «-l 
ot(1)«-1 
return "Error" 
write: tempi «- 2- (w(2) + 2- w(l) + 4- w(0)) + 1 
if x = a then m( tempi) <- 
else wt( tempi) <- 1 

if x = b then mi tempi) + 1 «- 
else wj(templ+l) <- 1 

For the POP operation, we need only decrement the pointer. Unfortunately, this 

may require accessing some pointer memory cell more than once. The following 

simple algorithm is one possibility. 

a pop : if w(2) = 1 then »«(2) *- 

return 
mil) «-l 
if mil) = 1 then mil) <- 

return 
mil) <-l 
if miO) = 1 then w(0) «- 

return 
mil) «-0 
mil) «-0 
return "Error" 

Notice that this algorithm causes us to incorrectly change mil) and mi 2) in the 

case where an "Error" condition is to be returned, thus forcing us to go back and 

rewrite these cells. 

Excluding the cases where we get an Error condition, these three algorithms 

give us the following access costs, for all d 4 E>- 
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S> #[tt TOp (/?(c/))] >4 

s > #:a PUSHx ( /»((/))] >4 

5> #Ctt pop (/7(t/))] >4 I 

The strategy used in Example 6.11 for implementing; the stack could be used 
with any TOS pointer representation which has a fixed size pointer field. 

k 

Definition. Let D = Ux\ let r = Hog (k + 1)1, and let f be a function 

i=o ISI 

(••X -» 8 . Suppose the pointer component ./! is a one-to-one function 
fi:{0,l, . . . ,k} -» 8 X . Then the TOS pointer representation p-\D -* 8 f defined 

by 

Ml 

M = U{fU(i))} n+r u{f.(lc/l)} , 

i=l * 

is said to be a TOS pointer representation with a fixed size pointer field. 

The TOS pointer representations in both examples 6.10 and 6.11 have fixed size 
pointer fields, and we implemented the stack operations in essentially the same way, 
first reading" the pointer and then, if necessary, accessing the list component. 

k 

Theorem 6.7. Let ID = Ux 1 and let r = ("log (k + 1)1. Let f be a function 

i=o ISI 

f:X -» 8* such that max|f(x)| = t. Consider the TOS pointer representation 

xtX 
p:\D -» S*, with a fixed size pointer field, defined by 

lei! 

p{d) = U{f(c/(i))} t(1 . 1)+r U{.C(lc/|)} , 

where P.-.J -> 8 T . Then it is possible to define the representation .£ in such a 
way that the stack operations can be implemented with algorithms which have 
the following access costs. For all d ( ID, 

r + t > nia 1Q? { p{d))l >r + l 

r + t > #[Clp US H X (/'U))3 ^r + 1 
2r - 1 > #Ca pop ( /»(rf))] >1 
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Proof: The construction of algorithms for the TOP, PUSHx, and POP operations 
is the same as in Example 6.11, and we shall not present all of the details here. We 
define C.{i) £ 8 T so that when the string C(i) is viewed as a number it is the base 
\S\ representation of i (with preceding O's, if necessary, since |f.(i)l = r). 

We construct a TOp so that it reads the r memory cells in the pointer and then 
goes to field ii| d | to read the top stack element. Thus, tt TOp accesses at least r + 1 
and at most r + t memory cells, depending on the size of the representation of the 
element at the top of the stack. 

Now consider implementing an «. pusHx algorithm. By the way we have 
defined the pointer component C, it is possible to increment the pointer as we read 
it, if we access cells in the order m(r-l), w(r-2), . . . , 1, 0. (See Example 6.11 for 
an illustration.) After reading the r pointer cells, we locate the appropriate field 
and write f(x), a total of r + lf(x)| accesses. 

For the tt pop algorithm we need only decrement the pointer. So it would 
never be necessary to make more than 2-r - 1 accesses, because we could just read 
the pointer in one pass and rewrite it in the next. On the lower bound side, we 
clearly need to make at least one access. I 

Notice that, using the method from Example 6.11, the 2-r - 1 upper bound on the 
number of accesses for the POP operation would be attained only when \d\ = and 
an "Error" message is returned. For d * \, r would be an upper bound and we 
frequently would be able to do even better. 

In the proof of Theorem 6.7, the only reference to the particular pointer P. we 
chose was in obtaining the upper bound for the cost of performing a PUSHx 
operation. As we argued there for the POP operation, it would always be possible 
to increment the pointer by making 2-r - 1 accesses. This gives us the following 
corollary. 
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k 

Corollary 6.7.1. Let ID = Ux 1 and let r = Hog (k + 1)1. Let f be a 

1 = 151 

function f:X -» 0* such that maxlf(x) I = t. Consider the TOS pointer 

x<X 
representation />:ID -* 0* defined by 

M 

/>(d) =U{f(rf(l))} t( ,. 1)+r U{C(lc/l)} . 
1=1 

Then for any one-to-one pointer function C:J -» # r , it is possible to implement* 
the stack operations so as to obtain the following access costs. For all d * ID, 
r + t > Ul(k T0? {p{d))l >r + l 
2r -1 + t > #Ca pusHx (/»(J))] > r + 1 
2r-l > #Ctt pop ( /?(c/)) 3 >1 

Theorem 6.7 and Corollary 6.7.1 gave us upper bounds on access costs for 

performing POP, PUSHx, and TOP operations using a TOS pointer representation 

with fixed position fields. The bounds depend on r, not on Id I, although the size 

of r itself is dependent on max|d|: r = Tlog (maxldl + l)l Thus, when Id I is 

d<ID ISI d*ID 

small, being forced to read r cells could be relatively expensive (e.g., when r is 

large and the stacks we are representing are small). Consider, however, where 

these bounds came from. We can rewrite the result of Corollary 6.7.1. For any TOS 

pointer representation with a fixed size pointer field, we can implement the stack 

operations with the following access costs: 

Uia jop (p(d))l < l£(ldl)l + |f(q TOp (d))l 

#L-a PUSHx (/>(d))] < U(ldl)l + lf(x)l 

Hfl(|rf|)l-1 for It/I =0 

Hia pop (p(d))l << 

jC(ldl)l for It/I ^ 

assuming that the function f is a representation and achieves Kraft storage. 

Let us now extend these results to TOS pointer representations where we do 

k 

not have fixed size pointer fields. We would also like to allow ID = UX , where 

1 = 

k < «'. From the above discussion it should be easy to see that the following 
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theorem holds. 

k 

Theorem 6.8. Let ID = \Jx\ where k < «>, and let UX -» S* achieve Kraft 

i = 

storage. Consider the TOS pointer representation /?:ID -> 3 defined by 

fid) = U{f(rf(0)} n U{fi(|c/|)} n> 
i=l ' 

where P. is any representation P.-J -> 8 ■ Then it is possible to implement the 
stack operations so as to achieve the following access costs. For all d ( ID 
Ka T0P ( p(um < lfi(UI)l + lf(q T0P (t/))l 

#ca PUSHx (Mc/))] <2-ic(UDl + lf(x)l -l 
#[a pop ( />(«/))] <2-lfi(lc/l))l -1 

Proo/V For any TOS pointer representation, reading- the pointer immediately tells 
us the location of the top of the stack. So we can certainly perform a TOP 
operation, by accessing each pointer cell and then reading enough cells in the list 
component for us to distinguish q JQp {d). Since f achieves Kraft storage, this is 
precisely |f.(lt/l)l + lf(q T0P (t/))l. For the PUSHx and POP operations, it is, in 
general, necessary to rewrite the pointer, which at worst would require 2-|f.(lt/|)l - 1 
accesses: one pass over P.(\d\) to read and the next to rewrite. For a POP, we 
need not access the list component at all, and for a PUSHx, we need to write f(x) 
into memory. I 

From Theorem 6.8, the issue is now to see how compact we can make out- 
pointer component f.(k/|). Recalling the construction of the class of pointers P 
from Section S.4 (see Table 5.2), we have a possible representation scheme, with 

|e(|e/|)| =0(loglc/|). 
Consider using this scheme to perform a PUSHx or a POP. Since each pointer is a 
representation of a natural number n, we want to be able to increment or decrement 
by one the number to be represented. For the scheme in Section 5.4, this means we 
always need to alter the "rightmost" cell in the pointer representation. Since the size 
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of the pointer component is not fixed (in fact, it may be unbounded), there is no 
way to know where this rightmost cell is, unless we read (most of) C{\d\). Even 
then, we might be forced to backtrack. We shall construct a new pointer scheme, 
with the same storage cost as our previous one, but for which it will be easier to 
increment and decrement the pointer. This makes it, on the average, cheaper to 
perform a POP operation. 

Recall the pointer representation scheme fi 2 as illustrated in Table 5.1: 

£ *(n) = |h(n)l -l-h(n), 
where we write h(n) for h 2 (n). For instance, consider 

JL 2U8) =000010011. 
In order to perform a POP operation on a stack of length 18, we need to decrement 
the pointer, leaving us 

fi^(17) =000010010. 
Notice that we needed to alter only the last bit in the pointer, but there is no way 
to locate this last bit without reading the entire pointer. If we could rearrange bits 
so that we read the last bit (of h(n)) early, then whenever n is even we would 
just change the appropriate bit to 1 and immediately be finished with our POP 
operation. We can do this by interspersing the bits of C ^(n) from the lh(n)l 
component with those from the h(n) component (using an extra 1 to denote the 
end of the pointer representation). Note that these two components each have the 
same number of bits. Since we would like to be able to read the last bit of h(n) as 
early as possible, we reverse the bit order of h(n). Such a strategy gives us 

A J (18) =010100001 

A J(17) =000100001. 

For clarity, we have underlined the bits that come from the h(n) component. Some 

additional values of A ^ are given in Table 6.1. 

. We now give a formal definition of the pointer representation scheme A \. 

We begin with the following preliminary definition, based on the definition of the 

string h (n) from Section 5.4. 
131 
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Definition. For 161 > 2, we define the string 

i.e., the reverse of the string,' h (n). For 1 < i < 19 „ I. we write 8 _ (i) to 

" 161 n 

161 
denote the i l component of the string 8 n . For notational simplicity we may 

simply write 8 n to stand for G n . 

So 8 n (1) is the last character in the string h _, (n), 8 n (2) is the next to the 
last character in the string h (n), etc. 

Example 6.12. Since h 2 (18) = 0011. Then 8 1S = 1100, and we have G 18 (l) - 1, 

e ia (2) =1, e 18 (3) =0, e 18 (4) =o. i 

We now define the pointer representation A 2 , in terms of the string 6 n . 

Definition. Let 161 > 2. We define the pointer representation scheme A z as 



follows: A 2 (n) 6 U{0.e n (i)} 2(M) U{l} w . 

We illustrate the definition with an example. 

Example 6.13. Let us determine the pointer A 2 (26). Recall from Section 5.4 that 
h 2 (26) = 1.011. So8 2 (2C) =1101, and 

A 2 (26) = (0- 1} U {0- 1} 2 U {0- 0} 4 U {0- 1} 6 U {1} 8 

= 010100011. I 

Table 6.1 gives the pointer representations A 2 (n) for < n < 33. 

Now that we have defined the pointer representation scheme A 2 , let us use 
this scheme and determine access costs for implementing the stack operations. 
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'able 6.1. Construction of pointer representation A z 



2- 



187 - 



Theorem 6.9. Let ID = Ux\ let # = {0,1}, and consider a function 

1 = 

f :X -* 8^. Let p-\D -» 8 be a TOS pointer representation 

Ml 

p(d) = U{f(e/(i))} n U{A l(\d\)} n , 
where n, n. * IN. Let kj * !N. Then it is possible to implement the stack 
operations so as to achieve the following access costs for all d ( ID. 
MUk J0p (p(d))l < 1A ^( lc/1) 
#[a pusHx (/?(c/))3 < |A J(UI)I + lf(x)l + 2 
#Ca pop (/5(c/))3 < IA *(|</|)| + 1 

Proof: As we have previously seen, it is certainly possible to implement the I OP 
operation by reading the entire pointer and then going to the appropriate location 
to look up the answer q TOp U). Although a lookup of this answer might require 
making more than lf(q T0P (c/) )l accesses, it cannot take more than some constant 
number of accesses, depending on details of the function f. 

We have constructed the representation scheme A j so that it will be easy to 
decrement the stack pointer. Consider the following algorithm: 

a p0p if m{0) = 1 then return "Error" 

i <-l 
loop: if m(i) = 1 then m{i) <- 

return 
m(i) «-l 
if w(i+l) = 1 then m{i-l) *- 1 

return 
i <- i + 2 
goto loop 

In this algorithm, we read the pointer from left to right and never backtrack over 

more than one cell. This gives the desired bound for POP. 

A similar scheme allows us to perform a PUSIlx operation. 

a pusHx : if w(0) = 1 then m(0) «- 

mil) ^0 
m{2) «-l 
return 
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loop: if w(i) = then m{i) «- 1 

return 
m{i) ^0 

if m(i+l) = 1 then wi(i+l) «- 
w(i+2) *- 
m(i+3) «- 1 
return 
i «- i+2 
goto loop 

In this case we read the entire pointer and, although we never need to backtrack, 

we sometimes need to rewrite two additional cells. Having incremented the pointer, 

we can insert f(x) in the appropriate field with lf(x)l accesses. I 

We can see that we have improved our previous access costs, so that each 
stack operation can be implemented with at most 0(log|c/l) accesses in the worst 
case. In fact, the next result shows that we could expect to do even better for a 
POP operation because for a very reasonable probability distribution we can expect 
to make, on the average, only a constant number of accesses. 

CO 



Theorem 6.10. Let ID = U*' 1 , let 8 = {0,1}, and consider a function 

1 = 

f:A' -+ S + . Let p:\D -» # + be a TOS pointer representation 



p(d) = U{f(rf(i))} n U{A >(|rf|)} n 
i=i ' 

where n, n, ^ IN. Assume that there is a monotonically nonincreasing 

probability distribution P on the stack states: 

P(lc/I =n + 1) < P(lc/| = n). 

Then it is possible to implement the POP operation so that 

avg#Ca pop ( /»((/))] < k, 

for some k * IN. 



m 



Proof: Consider the algorithm (l p0? presented in the proof of Theorem 6.9. Note 
that 2 accesses are required for It/I = 2,4,6,8,10, . . . , that 4 accesses are required 
for lc/1 = 5,9,13,17, . . . , that 6 accesses are required for It/I = 11,19,27,35, . . . , etc. 
Denote P(lt/| = l) by p ( . Since p n+1 < p n , we know that 

P 2 + P 4 + P 6 + P 8 + ■ • ■ < Pi + P 3 + P 5 + P? + ' ■ • 

and so p 2 + p 4 + p 6 + p a + . . . < y 

Similarly, p 5 + p D + p 13 + p w + . . . < j, 

l J n + Pi9 + P 27 + P 3 5 + --- %> 

P 2 3 + P39 + PS5 + P71 + -" ^T6' etC - 

Notice that extra work is required to perform the POP whenever \J\ = 1, It/I = 3, 
lt/| = 7, |c/| = 15, etc. (i.e., when \d\ = 2* - 1 for some i <HN ). Thus, 

OO OO CO 

2 Pl .//ca rop (/»(irfi=i))3< 2 -fj- + 2 P k -2(k + i) 

i = i = l * k = U _1 ) 

CO CO 



1=1 'I' k = 2 

CO co 

i = l 2 k = <J 

= 10 



The following theorem summarizes the results we have just derived. 

CO 

Theorem 6.11. Let lD = Ux', let # = {0,1}, and consider a function 

i = 

f:X -> # + . Let /5=D -» S + be a TOS pointer representation 

\d\ 

p{J) = U{f(t/(i))} n U{A 2 (lt/l)} n 
i=l 1 

where n, n^lM. Assume that there is a monotonically nonincreasing 

probability distribution P on the stack states: 

P(lt/| = n + l) <P(lt/l = n). 

Let k 2 , k 3 ^IN. Then A 2 achieves Kraft storage, and it is possible to 

implement the stack operations so as to achieve the following access costs: 
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#Ca T0P (/>(rf))] < 2- Llog 2 (lt/|+l)J + k 2 
«Ca puSHx (/?(c/))] < 2- Llog 2 (k/kl)J + |f(x)l + 3 

#:a P0P ( />(«/))] <k 3 . 

Proof: The result for the POP operation is the result of Theorem 6.10. We obtain 
the inequalities for TOP and PUSHx by recalling that IA 2 (n)l = \P- 2 (n)l and by 
making use of Lemma 5.7 and theorems S.18 and 6.9. The Kraft storage follows 
from Theorem 5.19. I 

We have chosen to prove these results for the pointer scheme A 2 , but the 

scheme can be extended to include A '. . As it turns out, the access costs we obtain 

151 
are even better than for A 2 , although the results are all of the same order of 

growth. Because the details would tend to obscure an understanding of the class of 

pointer schemes A. we shall not formally define A ' , for 101 > 2 or i > 1. But let 

us indicate informally how these extensions could be made. Note that we shall 

always have 

IA| a (n)l.l«^n)l, 

and, in fact, the string A ' (n) is just a rearrangement of the elements in the 

101 

string 0. \ (n). 
|0| 
Consider 101 = 3 and recall h 3 (n) from Table 5.2. Since we want to construct 

A 3 (n) in such a way that it is a rearrangement of the elements in C 3 (n), recall 

that 

P. 3 (n) =h 2 (lh 3 (n)l)-2-h 3 (n). 

In this case the first (pointer) component of P 3 (n) has only about log 2 ( lh 3 (n) 1) 

elements, whereas the second (list) component has lh 3 (n)l elements. So we clearly 

cannot just use every other cell for the first component, as we did with A 2 (n). 

Referring again to Table 5.2, we see that our pointer component has a when the 

list component has length 1, has a 1 when the list component has length 2, has 00 

when the list component has length 3, etc. So 



1 C J1 - 



n 



lh 3 (n)| = 2 2 1_1 G n (i), 

1=1 

where G n denotes G jj. When |h 3 (n)l = 1 then A 3 (n) is of the form 0_, when 
lh 3 (n)| - 2 then A 3 (n) is of the form 1_, etc. This scheme is illustrated in Figure 
6.1. The string 9 n is written out in blocks of size 2 1 and the coefficient of each 
block, or 1, tells whether there are 2 1 or 2- 2 1 elements, respectively, in that block. 
Rather than attempt to say more in words, we refer the reader to Table 6.2. 

I6 n l form of A *(n) 

1 0_2 

2 1__2 

3 0_0_ _2 

4 1_ _0_ _2 

5 0_1 2 

6 1__1 2 

7 0_0__0 2 

8 1_ _0_ _0 2 

9 0_1 2 

10 1__1 2 

11 1 2 



Figure 6.1. Outline of scheme for A *(n). 



It is also possible to construct A J for i > 1. The procedure is outlined in 



Figure 6.2. Notice that we write the initial part of n , as much as possible, in 

k 

blocks of size 1, 2, 2 2 , 2 3 , 2 4 , etc. Of course, when |0 n l * 22 1 for some k (i.e., 

1 = 

l®J * M.VS, etc.), then some digits in 8 n will be left over. In particular, let 

J+i 
= min{j| 22'>|0 n l} 



r 

1-0 

r 



Then we can write the first ]>2' elements in blocks of size powers of 2, each block 



i = 



preceded by a ; a 1 indicates when we do not want to continue reading the next 
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able 6.2. Construction of pointer representations A 3 and A 
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6.5 The BOS Pointer Representation 

For the sake of completeness, let us briefly mention the bottom of stack pointer 
representation. 

Example 6.14. Recall Example 6.10, where we had X = {a,b,c,d}, 8 - {0,1,2,3}, 

3 

and ID = U X 1 . The function UX -> S* is defined by 

1 = 

f(a) =0 

f(b) =1 

f(c) = 2 

f(d) =3 

and the pointer R'.J -* 8 is defined by. 

HO) =0 

fl(l) =1 

f.(2) =2 

1(3) =3 

Then we can define the concatenation -preserving pointer representation p-\D -* 8 
by 

p(d) =U{f(c/(i))} |dH+1 U{C(|c/|)} . 



1=1 



For instance, 



pCK) =0 

p{d) =13 

/j(cab) =3102 

Assuming L is large enough to represent any d ( ID (i.e., L > 4), let us construct 

algorithms to implement the stack operations. In order to perform a POP operation 

we need not only decrement the pointer but the contents of all of the memory cells 

will have to be shifted left by one, 

0. pop : if /n(0) = then return "Error" 

m(0) «-w(0) -1 
i «- m{0) + 1 

while i > 1 do w(i-l) <- m{i) 
i <- i - 1 

Similarly, the PUSHx operation requires that the contents of each cell be shifted 
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right by one. 



a pusHx : ifm(O) =3 then return "Error" 

«(0) *- m(0) + 1 
tempi «- »i(0) 
temp2 *- ffi(l) 
ffi(l) «-f(x) 
i ^2 

while i < tempi then temp2 5 tn{i) 
i <- i + 1 



The TOP operation is much easier 

if tfl 
return m(l) 



a TOp : ifw(O) =0 then return "Error" 



We can extend the pointer scheme in the previous example and define the BOS 
pointer representation in the obvious way. 

CO 

Definition. Let ID = [_}x\ for k * IN and consider a function f:X -* 8 . Let 

i = 

p-\D -* 8 be any pointer representation 

p(d) = U{fU(.))} n u{fi(UI)} n 

1=1 n ldkl-i 

where n, n, £ IN (for any < i < k) and where A is a representation 
f-J -» # + . Then p is a iotttw: of stack (BOS) pointer representation. 

The types of arguments used in the preceding sections can be used to 
determine the access costs for implementing the BOS pointer representation. For 
PUSHx or POP, the elements in the stack will all have to be moved, requiring an 
access to each n j field and also reading the entire pointer component (assuming the 
pointer achieves Kraft storage). A TOP operation is, however, cheap since it is 
always located in the same field, assuming, of course, that there is a TOP clement. 
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Theorem 6.12. Let ID = Ux 1 for k < IN and let the function (-X -* 3 + be a 

i = 

representation that achieves Kraft storage. Consider the BOS pointer 

j. 
representation p-\D -> 8 defined by 

\d\ 

p(d) = U{f(c/(i))} n U{fi(le/I)} n> 

where n, n, £ IM and where C is any representation B.-3 -» 8 • Then any 
implementation of the stack operations will have the following access costs: 
#[a TOp ( />(</))] > 1 
#[a pusHx(/'(^) ] ^ l«(lc/l)l + lc/l + l 

flfidt/DI + lc/l if It/I *0 

#ca pop (/»(c/))] > J 

[l if |c/| = o 

We do not formally prove this theorem because the proof is similar to 
arguments we have already made and because we can now already see that the 
stack operations would have higher access costs than we would in general want. 

Note that the four stack representations we have discussed may all achieve 
Kraft storage, but their access costs differ greatly. We summarize in Table 6.3 some 
of the lower bounds we have determined in this chapter. 
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CHAPTER 7 
QUEUES 

The same framework that we have developed in this thesis can also be used to 
analyze queues. Although we shall not in this chapter prove any results, let us 
point out some of the complexities inherent in queues that are not present in stacks. 

Recall that a queue differs from a stack in that items are inserted at one end 
and deleted from the other. If we want to achieve Kraft storage in a representation 
of a queue, we know that we can use only a single pointer. This, however, does 
not allow multiple representations and so updating operations will necessarily have 
high access costs. In all of the examples we consider in this chapter, we shall 
assume a problem domain alphabet X = {a,b,c,d} and assume that 161 is large 
enough so that a pointer always fits in a single cell in the cases we consider. We 
shall also assume that a £ X is represented by * 8, b by 1, c by 2, and d by 3. 

Example 7.1. Suppose wc have a three element queue. Consider implementing 
such a queue with a single pointer and holding the other end fixed, 
a) Let the rear of the queue be fixed; i.e., all insertions are made to the same cell. 
Thus, the entire contents of the queue must be slid each time an insertion is made. 
On the other hand, we need only decrement the pointer to delete an item from the 
queue. For instance, suppose our queue initially has three elements inserted: b, a, 
c. So m(Q) = 2, mil) = 0, and m{2) = b 

2 1 _ Pointer to front: 3 

If we DELETE an item we are left with 

2 0__ Pointer to front: 2 
If we now INSERT(d), we obtain 

3 2 _ Pointer to front: 3 

Notice that each of the elements already on the queue had to be moved when we 
made an INSERT. With this scheme, a DELETE operation requires only a single 
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access in order to decrement the pointer. An INSERT operation, however, requires 
lc/| + 1 accesses, where \d\ is the initial queue size. 

b) If the front of the queue were stationary, then it would be an insertion which 
would be easy to perform. As above, suppose we have intially inserted b, a, c on 
the queue: 

1 2 _ Pointer to rear: 3 

A DELETE operation requires moving the contents of each element in the queue: 

2 __ Pointer to rear: 2 

Now an INSERT(d) is simple: 

2 3 _ Pointer to rear: 3 

Using this second scheme, an INSERT operation requires two accesses, one to the 
pointer and one to insert the new element. On the other hand, the DELETE 
operation requires accessing every element in the queue (as well as the pointer), 
\d\ + 1 accesses. I 

The tradeoff in the preceding example suggests that we do not want to 
consider separately the access costs for the INSERT and DELETE operations; 
instead, we might want to consider the cost of a DELETE-INSERT pair of 
operations. In Example 7.1a we found that an INSERT had cost \d\ + 1 and a 
DELETE had cost 1, a total cost of \d\ + 2 accesses for the DELETE-INSERT pair. 
In Example 7.1b, INSERT had cost 2 and a DELETE had cost \d\ + 1, a total cost of 
\d\ * 3. 

The expense involved in the INSERT or DELETE operation in Example 7.1 
was due to the fact that we were forced to always maintain one end of the queue 
fixed. Of course, if we were to allow two pointers, then we would not have this 
problem. Instead, let us consider a scheme where we allow a queue to have one end 
in one of, say, two positions. 
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Example 7.2. Reconsider Example 7.1b but assume that the pointer is large enough 
that one bit can be reserved to indicate whether the "fixed" end of the queue is in 
cell or in cell 1. Suppose our initial queue state is, as before: 

1 2 _ Pointer to rear: 3 Front: 
Now if we do a DELETE, we do not need to move any of the list elements: 

_ 2 _ Pointer to rear: 3 Front: 1 

An INSERT(d) operation gives: 

_ 2 3 Pointer to rear: 4 Front: 1 

Unfortunately, another DELETE will require moving the queue: 

2 3__ Pointer to rear: 2 Front: 
^ Finally one more INSERT(a): 

2 3 _ Pointer to rear: 3 Front: 

This effectively brings us back to our initial state (although the actual queue 
elements differ). Notice that these four operations we performed required, in 
order, 1, 2, lc/1 + 2, and 2 accesses, where \J\ refers to the size of our initial queue 
stare before the two pairs of DELETE-INSERT operations were performed. I 

So in Example 7.2, by reserving one bit of the pointer to indicate the location of the 
front of the queue, we used a total of Ul + 7 accesses, only — * — accesses on the 
average for a DELETE-INSERT. On the other hand, without using this extra bit 
we in Example 7.1 were forced to make \d\ + 2 accesses for a DELETE-INSERT. So 
we were able to not only delay the heavy cost of sliding the queue, but we in fact 
have decreased the average cost of a DELETE-INSERT pair. Let us use the same 
trick again and reserve two bits to tell us where the front of the queue is located; 
i.e., the front of the queue may be in any of cells 0, 1, 2, 3. 

Example 7.3. Civen an initial queue 2 10 3, let us perform a sequence of four 
DELETE-INSERT pairs of operations, keeping track of the numbers of accesses. 

2103 

DELETE: 10 3 1 
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INSERT(a): _lp_30__ 2 

DELETE: 3 1 

INSERT(c): __ 3 2 _ 2 

DELETE: 3 2_ 1 

INSERT(b): 3021 2 

DELETE: 021 M + 4 

INSERT(a): 0210 2 

This gives a total of lc/| + 15 accesses, an average of J accesses per 
DELETE -INSERT pair. I 

In general, if we reserve k bits of the pointer to indicate the location of one end of 

the queue, then there are 2 possible representations of each queue, and a 

DELETE-INSERT pair requires, on the average, 

lc/| + 2 k + 3-(2 k -l) +2 _\d\ ., 1 n , MU 

accesses. 

Thus, we have seen that a one pointer scheme allows no multiple 
representations and we may achieve Kraft storage. Using a two pointer scheme, the 
queue could be located anywhere in memory (within the range of the pointers) and 
may, in fact, drift throughout memory. An intermediate scheme has a single 
pointer which has enough room for \J\ with one or more extra bits reserved to 
indicate the location of one end of the queue. In this latter case, we not only defer 
but actually save in our access cost. This illustrates not only a storage-access 
tradeoff but also a tradeoff with multiplicity of representation, and we have a nice 
continuum between the one and two pointer cases. 

Suppose we do want to achieve Kraft storage and are using a single pointer. It 
is interesting to consider how many accesses are required in order to perform a 
DELETE-INSERT pair of operations. If the queue is always of a fixed size k (i.e., 
the only operations performed are DELETE-INSERT(a) pairs), then, somewhat 
surprisingly, it is possible to represent the queues in memory in such a way that the 
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average number of cells accessed is a constant independent of the length \d\. On 
the other hand, suppose we insist that the representation function p have the 
constraint that p{d) is a permutation of d and that c/(i) always maps to Che same 
memory cell(s), for all < i < \d\. Then it can, in fact, be shown that a 
DELETE-INSERT pair of operations performed on all queues of a fixed length k 
will have an average access cost of at least ( ' | Z ) • k. Thus, for most natural 
encoding schemes it will be necessary to access essentially \d\ cells. 
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CHAPTER 8 

CONCLUSIONS 

In tins thesis we have explored what it means for a list to be 
information -theoretically optimal, in the sense that it achieves Kraft storage and 
Kraft access. We first examined the full set of table lookup questions and showed 
that if we are considering problem domains of the form ID = U A' , then it is 

possible to achieve both bounds simultaneously only for domains ID = A' n and 
ID = {A} U A n . This corresponds to a notion of independence; essentially, it must 
be the case that no matter what the value c/(i) * A', then c/(i+l) might take on any 
value in A'. If we were to determine c/(i) - then it would have to be the case 
that c/(i+l) = and we would not have independence. Of course, we did see that 
there is a perhaps surprising exception, namely, when ID = {A} U A' n and 151 = 2. 

As a consequence of this work, we were able to show that it is never possible 
to achieve both Kraft storage and Kraft access for many common list representation 
schemes. The only exception was for a fixed size representation, when ID = A' n . 
Since we are here primarily interested in variable-length lists, it is clear that we will 
not be able achieve both. 

We discussed four natural stack representation schemes: TOS endmarker, BOS 
endmarker, TOS pointer, and BOS pointer. We were able to obtain fairly tight 
lower bounds on access costs for performing POP, PUSHx, and TOP operations; 
those results are summarized in Table 6.3. It is shown that endmarker 
representations are necessarily expensive to update. On the constructive side, we 
developed a representation scheme for a TOS pointer that is storage optimal and 
does quite well for access. Assuming a monotomically nornncreasing probability 
distribution on stack lengths, we were able to obtain the following access costs: 
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#W TOP (/»(c/))] < 2- Llog 2 ( lt/l+1) J + kj = Odoglc/I) 
#Ca pusHx (/>(c/))] < 2- Llog 2 (lc/l+l)J + k 2 = 0(log(lc/l) 
avg#[tt pop (/?(t/))] < k 3 , 
for kj, k 2 , k 3 ( IM. The bounds we obtained give an indication as to why pointer 
representations are so commonly used in the practical implementation of stacks. 

In the discussion of stacks, we were forced to examine separately several 
classes of representations. It would be nice if there were some more general 
characterization that would allow us to make more general statements. For instance, 
is it possible for any implementation to perform both a PUSHx and a POP in a 
constant number of accesses. 

The model that wc used is capable of more generalization. For instance, 
instead of considering access costs for performing only a single operation, we might 
wish to perform a sequence of operations. Also, our definition of access or storage 
costs could be altered to correspond to the desired application; we might even be 
able to consider ^me *orr of hierarchical memory structure. 

There remains a great deal of work to be done. Perhaps the most obvious is 
the need to apply the techniques used in this thesis in order to examine other types 
of lists. We briefly discussed queues, but it is clear that queues raise a lot of issues 
that were not present with stacks. The flavor of some preliminary results was 
indicated in that chapter. It appears that dequeues are a straightforward extension 
of queues, but there remain many other types of lists to be explored. In addition, 
it would be interesting to know whether similar arguments could be applied to trees. 
Some of the techniques discussed may also be useful in the analysis of hashing 
tables. 
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